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Preface 


Progress  in  telecommunications  over  the  past  two  decades  lias  been  nothing  short  of  revolution¬ 
ary,  with  communications  taken  for  granted  in  modern  society  to  the  same  extent  as  electricity. 
There  is  therefore  a  persistent  need  for  engineers  who  are  well- versed  in  the  principles  of  commu¬ 
nication  systems.  These  principles  apply  to  communication  between  points  in  space,  as  well  as 
communication  between  points  in  time  (i.e,  storage).  Digital  systems  are  fast  replacing  analog 
systems  in  both  domains.  This  book  has  been  written  in  response  to  the  following  core  question: 
what  is  the  basic  material  that  an  undergraduate  student  with  an  interest  in  communications 
should  learn,  in  order  to  be  well  prepared  for  either  industry  or  graduate  school?  For  example,  a 
number  of  institutions  only  teach  digital  communication,  assuming  that  analog  communication 
is  dead  or  dying.  Is  that  the  right  approach?  From  a  purely  pedagogical  viewpoint,  there  are 
critical  questions  related  to  mathematical  preparation:  how  much  mathematics  must  a  student 
learn  to  become  well-versed  in  system  design,  what  should  be  assumed  as  background,  and  at 
what  point  should  the  mathematics  that  is  not  in  the  background  be  introduced?  Classically, 
students  learn  probability  and  random  processes,  and  then  tackle  communication.  This  does  not 
quite  work  today:  students  increasingly  (and  I  believe,  rightly)  question  the  applicability  of  the 
material  they  learn,  and  are  less  interested  in  abstraction  for  its  own  sake.  On  the  other  hand, 
I  have  found  from  my  own  teaching  experience  that  students  get  truly  excited  about  abstract 
concepts  when  they  discover  their  power  in  applications,  and  it  is  possible  to  provide  the  means 
for  such  discovery  using  software  packages  such  as  Matlab.  Thus,  we  have  the  opportunity  to 
get  a  new  generation  of  students  excited  about  this  held:  by  covering  abstractions  “just  in  time” 
to  shed  light  on  engineering  design,  and  by  reinforcing  concepts  immediately  using  software  ex¬ 
periments  in  addition  to  conventional  pen-and-paper  problem  solving,  we  can  remove  the  lag 
between  learning  and  application,  and  ensure  that  the  concepts  stick. 

This  textbook  represents  my  attempt  to  act  upon  the  preceding  observations,  and  is  an  out¬ 
growth  of  my  lectures  for  a  two-course  undergraduate  elective  sequence  on  communication  at 
UCSB,  which  is  often  also  taken  by  some  beginning  graduate  students.  Thus,  it  can  be  used  as 
the  basis  for  a  two  course  sequence  in  communication  systems,  or  a  single  course  on  digital  com¬ 
munication,  at  the  undergraduate  or  beginning  graduate  level.  The  book  also  provides  a  review 
or  introduction  to  communication  systems  for  practitioners,  easing  the  path  to  study  of  more 
advanced  graduate  texts  and  the  research  literature.  The  prerequisite  is  a  course  on  signals  and 
systems,  together  with  an  introductory  course  on  probability.  The  required  material  on  random 
processes  is  included  in  the  text. 

A  student  who  masters  the  material  here  should  be  well-prepared  for  either  graduate  school  or 
the  telecommunications  industry.  The  student  should  leave  with  an  understanding  of  baseband 
and  passband  signals  and  channels,  modulation  formats  appropriate  for  these  channels,  random 
processes  and  noise,  a  systematic  framework  for  optimum  demodulation  based  on  signal  space 
concepts,  performance  analysis  and  power-bandwidth  tradeoffs  for  common  modulation  schemes, 
a  hint  of  the  power  of  information  theory  and  channel  coding,  and  introduction  to  communication 
techniques  for  dispersive  channels  and  multiple  antenna  systems.  Given  the  significant  ongoing 
research  and  development  activity  in  wireless  communication,  and  the  fact  that  an  understanding 
of  wireless  link  design  provides  a  sound  backgroimd  for  approaching  other  communication  links, 
material  enabling  hands-on  discovery  of  key  concepts  for  wireless  system  design  is  interspersed 
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throughout  the  textbook. 

I  should  add  that  I  firmly  believe  that  the  utility  of  this  material  goes  well  beyond  communica¬ 
tions,  important  as  that  field  is.  Communications  systems  design  merges  concepts  from  signals 
and  systems,  probability  and  random  processes,  and  statistical  inference.  Given  the  broad  appli¬ 
cability  of  these  concepts,  a  background  in  communications  is  of  value  in  a  large  variety  of  areas 
requiring  “systems  thinking,”  as  1  discuss  briefly  at  the  end  of  Chapter  1. 

The  goal  of  the  lecture-style  exposition  in  this  book  is  to  clearly  articulate  a  selection  of  concepts 
that  1  deem  fundamental  to  communication  system  design,  rather  than  to  provide  comprehensive 
coverage.  “Just  in  time”  coverage  is  provided  by  organizing  and  limiting  the  material  so  that  we 
get  to  core  concepts  and  applications  as  quickly  as  possible,  and  by  sometimes  asking  the  reader 
to  operate  with  partial  information  (which  is,  of  course,  standard  operating  procedure  in  the 
real  world  of  engineering  design).  However,  the  topics  that  we  do  cover  are  covered  in  sufficient 
detail  to  enable  the  student  to  solve  nontrivial  problems  and  to  obtain  hands-on  involvement  via 
software  labs.  Descriptive  material  that  can  easily  be  looked  up  online  is  omitted. 


Organization 

•  Chapter  1  provides  a  perspective  on  communication  systems,  including  a  discussion  of  the 
transition  from  analog  to  digital  communication  and  how  it  colors  the  selection  of  material  in 
this  text. 

•  Chapter  2  provides  a  review  of  signals  and  systems  (biased  towards  communications  applica¬ 
tions),  and  then  discusses  the  complex  baseband  representation  of  passband  signals  and  systems, 
emphasizing  its  critical  role  in  modeling,  design  and  implementation.  A  software  lab  on  modeling 
and  undoing  phase  offsets  in  complex  baseband,  while  providing  a  sneak  preview  of  digital  mod¬ 
ulation,  is  included.  Chapter  2  also  includes  a  section  on  wireless  channel  modeling  in  complex 
baseband  using  ray  tracing,  reinforced  by  a  software  lab  which  applies  these  ideas  to  simulate 
link  time  variations  for  a  lamppost  based  broadband  wireless  network. 

•  Chapter  3  covers  analog  communication  techniques  which  are  relevant  even  as  the  world  goes 
digital,  including  superheterodyne  reception  and  phase  locked  loops.  Legacy  analog  modulation 
techniques  are  discussed  to  illustrate  core  concepts,  as  well  as  in  recognition  of  the  fact  that 
suboptimal  analog  techniques  such  as  envelope  detection  and  limiter-discriminator  detection 
may  have  to  be  resurrected  as  we  push  the  limits  of  digital  communication  in  terms  of  speed 
and  power  consumption.  Software  labs  reinforce  and  extend  concepts  in  amplitude  and  angle 
modulation. 

•  Chapter  4  discusses  digital  modulation,  including  linear  modulation  using  constellations  such 
as  Pulse  Amplitude  Modulation  (PAM),  Quadrature  Amplitude  Modulation  (QAM),  and  Phase 
Shift  Keying  (PSK),  and  orthogonal  modulation  and  its  variants.  The  chapter  includes  discussion 
of  the  number  of  degrees  of  freedom  available  on  a  bandlimited  channel,  the  Nyquist  criterion 
for  avoidance  of  intersymbol  interference,  and  typical  choices  of  Nyquist  and  square  root  Nyquist 
signaling  pulses.  We  also  provide  a  sneak  preview  of  power-bandwidth  tradeoffs  (with  detailed 
discussion  postponed  until  the  effect  of  noise  has  been  modeled  in  Chapters  5  and  6).  A  software 
lab  providing  a  hands-on  feel  for  Nyquist  signaling  is  included  in  this  chapter. 

The  material  in  Chapters  2  through  4  requires  only  a  background  in  signals  and  systems. 

•  Chapter  5  provides  a  review  of  basic  probability  and  random  variables,  and  then  introduces 
random  processes.  This  chapter  provides  detailed  discussion  of  Gaussian  random  variables,  vec¬ 
tors  and  processes;  this  is  essential  for  modeling  noise  in  communication  systems.  Examples 
which  provide  a  preview  of  receiver  operations  in  communication  systems,  and  computation  of 
performance  measures  such  as  error  probability  and  signal-to- noise  ratio  (SNR),  are  provided. 
Discussion  of  circular  symmetry  of  white  noise,  and  noise  analysis  of  analog  modulation  tech- 
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niques  is  placed  in  an  appendix,  since  this  is  material  that  is  often  skipped  in  modern  courses  on 
communication  systems. 

•  Chapter  6  covers  classical  material  on  optimum  demodulation  for  M-ary  signaling  in  the  pres¬ 
ence  of  additive  white  Gaussian  noise  (AWGN).  The  background  on  Gaussian  random  variables, 
vectors  and  processes  developed  in  Chapter  5  is  applied  to  derive  optimal  receivers,  and  to  analyze 
their  performance.  After  discussing  error  probability  computation  as  a  function  of  SNR,  we  are 
able  to  combine  the  materials  in  Chapters  4  and  6  for  a  detailed  discussion  of  power-bandwidth 
tradeoffs.  Chapter  6  concludes  with  an  introduction  to  link  budget  analysis,  which  provides 
guidelines  on  the  choice  of  physical  link  parameters  such  as  transmit  and  receive  antenna  gains, 
and  distance  between  transmitter  and  receiver,  using  what  we  know  about  the  dependence  of 
error  probability  as  a  function  of  SNR.  This  chapter  includes  a  software  lab  which  builds  on  the 
Nyquist  signaling  lab  in  Chapter  4  by  investigating  the  effect  of  noise.  It  also  includes  another 
software  lab  simulating  performance  over  a  time-varying  wireless  channel,  examining  the  effects 
of  fading  and  diversity,  and  introduces  the  concept  of  differential  demodulation  for  avoidance  of 
explicit  channel  tracking. 

Chapters  2  through  6  provide  a  systematic  lecture-style  exposition  of  what  I  consider  core  con¬ 
cepts  in  communication  at  an  undergraduate  level. 

•  Chapter  7  provides  a  glimpse  of  information  theory  and  coding  whose  goal  is  to  stimulate  the 
reader  to  explore  further  using  more  advanced  resources  such  as  graduate  courses  and  textbooks. 
It  shows  the  critical  role  of  channel  coding,  provides  an  initial  exposure  to  information-theoretic 
performance  benchmarks,  and  discusses  belief  propagation  in  detail,  reinforcing  the  basic  con¬ 
cepts  through  a  software  lab. 

•  Chapter  8  provides  a  first  exposure  to  the  more  advanced  topics  of  communication  over  dis¬ 
persive  channels,  and  of  multiple  antenna  systems,  often  termed  space-time  communication,  or 
Multiple  Input  Multiple  Output  (MIMO)  communication.  These  topics  are  grouped  together  be¬ 
cause  they  use  similar  signal  processing  tools.  We  emphasize  lab-style  “discovery”  in  this  chapter 
using  three  software  labs,  one  on  adaptive  linear  equalization  for  singlecarrier  modulation,  one  on 
basic  OFDM  transceiver  operations,  and  one  on  MIMO  signal  processing  for  space-time  coding 
and  spatial  multiplexing.  The  goal  is  for  students  to  acquire  hands-on  insight  that  hopefully 
motivates  them  to  undertake  a  deeper  and  more  systematic  investigation. 

•  Finally,  the  epilogue  contains  speculation  on  future  directions  in  communications  research  and 
technology.  The  goal  is  to  provide  a  high-level  perspective  on  where  mastery  of  the  introductory 
material  in  this  textbook  could  lead,  and  to  argue  that  the  innovations  that  this  field  has  already 
seen  set  the  stage  for  many  exciting  developments  to  come. 

The  role  of  software:  Software  problems  and  labs  are  integrated  into  the  text,  with  “code  frag¬ 
ments”  implementing  core  functionalities  provided  in  the  text.  While  code  can  be  provided  online, 
separate  from  the  text  (and  indeed,  sample  code  is  made  available  online  for  instructors),  code 
fragments  are  integrated  into  the  text  for  two  reasons.  First,  they  enable  readers  to  immediately 
see  the  software  realization  of  a  key  concept  as  they  read  the  text.  Second,  I  feel  that  students 
learn  more  by  putting  in  the  work  of  writing  their  own  code,  building  on  these  code  fragments 
if  they  wish,  rather  than  using  code  that  is  easily  available  online.  The  particular  software  that 
we  use  is  Matlab,  because  of  its  widespread  availability,  and  because  of  its  importance  in  design 
and  performance  evaluation  in  both  academia  and  industry.  However,  the  code  fragments  can 
also  be  viewed  as  “pseudocode,”  and  can  be  easily  implemented  using  other  software  packages  or 
languages.  Block-based  packages  such  as  Simulink  (which  builds  upon  Matlab)  are  avoided  here, 
because  the  use  of  software  here  is  pedagogical  rather  than  aimed  at,  say,  designing  a  complete 
system  by  putting  together  subsystems  as  one  might  do  in  industry. 
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Suggestions  for  using  this  book 


I  view  Chapter  2  (complex  baseband),  Chapter  4  (digital  modulation),  and  Chapter  6  (optimum 
demodulation)  as  core  material  that  must  be  studied  to  understand  the  concepts  underlying 
modern  communication  systems.  Chapter  6  relies  on  the  probability  and  random  processes 
material  in  Chapter  5,  especially  the  material  on  jointly  Gaussian  random  variables  and  WGN, 
but  the  remaining  material  in  Chapter  5  can  be  skipped  or  covered  selectively,  depending  on 
the  students’  background.  Chapter  3  (analog  communication  techniques)  is  designed  such  that 
it  can  be  completely  skipped  if  one  wishes  to  focus  solely  on  digital  communication.  Finally, 
Chapter  7  and  Chapter  8  contain  glimpses  of  advanced  material  that  can  be  sampled  according 
to  the  instructor’s  discretion.  The  qualitative  discussion  in  the  epilogue  is  meant  to  provide  the 
student  with  perspective,  and  is  not  intended  for  formal  coverage  in  the  classroom. 

In  my  own  teaching  at  UCSB,  this  material  forms  the  basis  for  a  two-course  sequence,  with 
Chapters  2-4  covered  in  the  first  course,  and  Chapters  5-6  covered  in  the  second  course,  with 
the  dispersive  channels  portion  of  Chapter  8  providing  the  basis  for  the  labs  in  the  second 
course.  The  content  of  these  courses  are  constantly  being  revised,  and  it  is  anticipated  that  the 
material  on  channel  coding  and  MIMO  may  displace  some  of  the  existing  material  in  the  future. 
UCSB  is  on  a  quarter  system,  hence  the  coverage  is  fast-paced,  and  many  topics  are  omitted  or 
skimmed.  There  is  ample  material  here  for  a  two-semester  undergraduate  course  sequence.  For 
a  one-semester  course,  one  possible  organization  is  to  cover  Chapter  2  (focusing  on  the  complex 
envelope),  Chapter  4,  a  selection  of  Chapter  5,  Chapter  6,  and  if  time  permits,  Chapter  7. 

The  slides  accompanying  the  book  are  not  intended  to  provide  comprehensive  coverage  of  the 
material,  but  rather,  to  provide  an  example  of  selections  from  the  material  to  be  covered  in 
the  classroom.  I  must  comment  in  particular  on  Chapter  5.  While  much  of  the  book  follows 
the  format  in  which  I  lecture,  Chapter  5  is  structured  as  a  reference  on  probability,  random 
variables  and  random  processes  that  the  instructor  must  pick  and  choose  from,  depending  on 
the  background  of  the  students  in  the  class.  The  particular  choices  I  make  in  my  own  lectures 
on  this  material  are  reflected  in  the  slides  for  this  chapter. 
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Chapter  1 
Introduction 


This  textbook  provides  an  introduction  to  the  conceptual  underpinnings  of  communication  tech¬ 
nologies.  Most  of  us  directly  experience  such  technologies  daily:  browsing  (and  audio/video 
streaming  from)  the  Internet,  sending/receiving  emails,  watching  television,  or  carrying  out  a 
phone  conversation.  Many  of  these  experiences  occur  on  mobile  devices  that  we  carry  around 
with  us,  so  that  we  are  always  connected  to  the  cyberworld  of  modern  communication  systems. 
In  addition,  there  is  a  huge  amount  of  machine-to-machine  communication  that  we  do  not  di¬ 
rectly  experience,  but  which  are  indispensable  for  the  operation  of  modern  society.  Examples 
include  signaling  between  routers  on  the  Internet,  or  between  processors  and  memories  on  any 
computing  device. 

We  define  communication  as  the  process  of  information  transfer  across  space  or  time.  Commu¬ 
nication  across  space  is  something  we  have  an  intuitive  understanding  of:  for  example,  radio 
waves  carry  our  phone  conversation  between  our  cell  phone  and  the  nearest  base  station,  and 
coaxial  cables  (or  optical  fiber,  or  radio  waves  from  a  satellite)  deliver  television  from  a  remote 
location  to  our  home.  However,  a  moment’s  thought  shows  that  that  communication  across  time, 
or  storage  of  information,  is  also  an  everyday  experience,  given  our  use  of  storage  media  such  as 
compact  discs  (CDs),  digital  video  discs  (DVDs),  hard  drives  and  memory  sticks.  In  all  of  these 
instances,  the  key  steps  in  the  operation  of  a  communication  link  are  as  follows: 

(a)  insertion  of  information  into  a  signal,  termed  the  transmitted  signal,  compatible  with  the 
physical  medium  of  interest. 

(b)  propagation  of  the  signal  through  the  physical  medium  (termed  the  channel )  in  space  or 
time; 

(c)  extraction  of  information  from  the  signal  (termed  the  received  signal )  obtained  after  propa¬ 
gation  through  the  medium. 

In  this  book,  we  study  the  fundamentals  of  modeling  and  design  for  these  steps. 

Chapter  Plan:  In  Section  1.1,  we  provide  a  high-level  description  of  analog  and  digital  com¬ 
munication  systems,  and  discuss  why  digital  communication  is  the  inevitable  design  choice  in 
modern  systems.  In  Section  1.2,  we  briefly  provide  a  technological  perspective  on  recent  devel¬ 
opments  in  communication.  We  do  not  attempt  to  provide  a  comprehensive  discussion  of  the 
fascinating  history  of  communication:  thanks  to  the  advances  in  communication  that  brought  us 
the  Internet,  it  is  easy  to  look  it  up  online!  A  discussion  of  the  scope  of  this  textbook  is  provided 
in  Section  1.3. 


1.1  Analog  or  Digital? 

Even  without  defining  information  formally,  we  intuitively  understand  that  speech,  audio,  and 
video  signals  contain  information.  We  use  the  term  message  signals  for  such  signals,  since  these 
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are  the  messages  we  wish  to  convey  over  a  communication  system.  In  their  original  form- 
both  during  generation  and  consumption-these  message  signals  are  analog:  they  are  continuous 
time  signals,  with  the  signal  values  also  lying  in  a  continuum.  When  someone  plays  the  violin, 
an  analog  acoustic  signal  is  generated  (often  translated  to  an  analog  electrical  signal  using  a 
microphone).  Even  when  this  music  is  recorded  onto  a  digital  storage  medium  such  as  a  CD  (using 
the  digital  communication  framework  outlined  in  Section  1.1.2),  when  we  ultimately  listen  to  the 
CD  being  played  on  an  audio  system,  we  hear  an  analog  acoustic  signal.  The  transmitted  signals 
corresponding  to  physical  communication  media  are  also  analog.  For  example,  in  both  wireless 
and  optical  communication,  we  employ  electromagnetic  waves,  which  correspond  to  continuous 
time  electric  and  magnetic  fields  taking  values  in  a  continuum. 


1.1.1  Analog  communication 


Message  Transmitted  Received  Message 

signal  signal  signal  signal 


Figure  1.1:  Block  diagram  for  an  analog  communication  system.  The  modulator  transforms 
the  message  signal  into  the  transmitted  signal.  The  channel  distorts  and  adds  noise  to  the 
transmitted  signal.  The  demodulator  extracts  an  estimate  of  the  message  signal  from  the  received 
signal  arriving  from  the  channel. 


Given  the  analog  nature  of  both  the  message  signal  and  the  communication  medium,  a  natural 
design  choice  is  to  map  the  analog  message  signal  (e.g.,  an  audio  signal,  translated  from  the 
acoustic  to  electrical  domain  using  a  microphone)  to  an  analog  transmitted  signal  (e.g.,  a  radio 
wave  carrying  the  audio  signal)  that  is  compatible  with  the  physical  medium  over  which  we  wish 
to  communicate  (e.g.,  broadcasting  audio  over  the  air  from  an  FM  radio  station).  This  approach 
to  communication  system  design,  depicted  in  Figure  1.1,  is  termed  analog  communication.  Early 
communication  systems  were  all  analog:  examples  include  AM  (amplitude  modulation)  and  FM 
(frequency  modulation)  radio,  analog  television,  first  generation  cellular  phone  technology  (based 
on  FM),  vinyl  records,  audio  cassettes,  and  VHS  or  beta  videocassettes 

While  analog  communication  might  seem  like  the  most  natural  option,  it  is  in  fact  obsolete.  Cel¬ 
lular  phone  technologies  from  the  second  generation  onwards  are  digital,  vinyl  records  and  audio 
cassettes  have  been  supplanted  by  CDs,  and  videocassettes  by  DVDs.  Broadcast  technologies 
such  as  radio  and  television  are  often  slower  to  upgrade  because  of  economic  and  political  factors, 
but  digital  broadcast  radio  and  television  technologies  are  either  replacing  or  sidestepping  (e.g., 
via  satellite)  analog  FM/AM  radio  and  television  broadcast.  Let  us  now  define  what  we  mean  by 
digital  communication,  before  discussing  the  reasons  for  the  inexorable  trend  away  from  analog 
and  towards  digital  communication. 


1.1.2  Digital  communication 

The  conceptual  basis  for  digital  communication  was  established  in  1948  by  Claude  Shannon, 
when  he  founded  the  field  of  information  theory.  There  are  two  main  threads  to  this  theory: 

•  Source  coding  and  compression:  Any  information-bearing  signal  can  be  represented  ef¬ 
ficiently,  to  within  a  desired  accuracy  of  reproduction,  by  a  digital  signal  (i.e. ,  a  discrete  time 
signal  taking  values  from  a  discrete  set),  which  in  its  simplest  form  is  just  a  sequence  of  binary 
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digits  (zeros  or  ones),  or  bits.  This  is  true  whether  the  information  source  is  text,  speech,  au¬ 
dio  or  video.  Techniques  for  performing  the  mapping  from  the  original  source  signal  to  a  bit 
sequence  are  generically  termed  source  coding.  They  often  involve  compression,  or  removal  of 
redundancy,  in  a  manner  that  exploits  the  properties  of  the  source  signal  (e.g.,  the  heavy  spatial 
correlation  among  adjacent  pixels  in  an  image  can  be  exploited  to  represent  it  more  efficiently 
than  a  pixel- by-pixel  representation). 

•  Digital  information  transfer:  Once  the  source  encoding  is  done,  our  communication  task  re¬ 
duces  to  reliably  transferring  the  bit  sequence  at  the  output  of  the  source  encoder  across  space  or 
time,  without  worrying  about  the  original  source  and  the  sophisticated  tricks  that  have  been  used 
to  encode  it.  The  performance  of  any  communication  system  depends  on  the  relative  strengths 
of  the  signal  and  noise  or  interference,  and  the  distortions  imposed  by  the  channel.  Shannon 
showed  that,  once  we  fix  these  operational  parameters  for  any  communication  channel,  there 
exists  a  maximum  possible  rate  of  reliable  communication,  termed  the  channel  capacity.  Thus, 
given  the  information  bits  at  the  output  of  the  source  encoder,  in  principle,  we  can  transmit  them 
reliably  over  a  given  link  as  long  as  the  information  rate  is  smaller  than  the  channel  capacity, 
and  we  cannot  transmit  them  reliably  if  the  information  rate  is  larger  than  the  channel  capac¬ 
ity.  This  sharp  transition  between  reliable  and  unreliable  communication  differs  fundamentally 
from  analog  communication,  where  the  quality  of  the  reproduced  source  signal  typically  degrades 
gradually  as  the  channel  conditions  get  worse. 

A  block  diagram  for  a  typical  digital  communication  system  based  on  these  two  threads  is  shown 
in  Figure  1.2.  We  now  briefly  describe  the  role  of  each  component,  together  with  simplified 
examples  of  its  function. 


Information 

bits 


Message 

signal 


Message 

signal 


Figure  1.2:  Components  of  a  digital  communication  system. 


Source  encoder:  As  already  discussed,  the  source  encoder  converts  the  message  signal  into  a 
sequence  of  information  bits.  The  information  bit  rate  depends  on  the  nature  of  the  message 
signal  (e.g.,  speech,  audio,  video)  and  the  application  requirements.  Even  when  we  fix  the  class 
of  message  signals,  the  choice  of  source  encoder  is  heavily  dependent  on  the  setting.  For  example, 
video  signals  are  heavily  compressed  when  they  are  sent  over  a  cellular  link  to  a  mobile  device, 
but  are  lightly  compressed  when  sent  to  an  high  definition  television  (HDTV)  set.  A  cellular  link 
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can  support  a  much  smaller  bit  rate  than,  say,  the  cable  connecting  a  DVD  player  to  an  HDTV 
set,  and  a  smaller  mobile  display  device  requires  lower  resolution  than  a  large  HDTV  screen.  In 
general,  the  source  encoder  must  be  chosen  such  that  the  bit  rate  it  generates  can  be  supported 
by  the  digital  communication  link  we  wish  to  transfer  information  over.  Other  than  this,  source 
coding  can  be  decoupled  entirely  from  link  design  (we  comment  further  on  this  a  bit  later). 
Example:  A  laptop  display  may  have  resolution  1024  x  768  pixels.  For  a  grayscale  digital  image, 
the  intensity  for  each  pixel  might  be  represented  by  8  bits.  Multiplying  by  the  number  of 
pixels  gives  us  about  6.3  million  bits,  or  about  0.8  Mbyte  (a  byte  equals  8  bits).  However, 
for  a  typical  image,  the  intensities  for  neighboring  pixels  are  heavily  correlated,  which  can  be 
exploited  for  significantly  reducing  the  number  of  bits  required  to  represent  the  image,  without 
noticeably  distorting  it.  For  example,  one  could  take  a  two-dimensional  Fourier  transform,  which 
concentrates  most  of  the  information  in  the  image  at  lower  frequencies  and  then  discard  many 
of  the  high  frequency  coefficients.  There  are  other  possible  transforms  one  could  use,  and  also 
several  more  processing  stages,  but  the  bottomline  is  that,  for  natural  images,  state  of  the  art 
image  compression  algorithms  can  provide  10X  compression  (i.e. ,  reduction  in  the  number  of  bits 
relative  to  the  original  uncompressed  digital  image)  with  hardly  any  perceptual  degradation.  Far 
more  aggressive  compression  ratios  are  possible  if  we  are  willing  to  tolerate  more  distortion.  For 
video,  in  addition  to  the  spatial  correlation  exploited  for  image  compression,  we  can  also  exploit 
temporal  correlation  across  successive  frames. 

Channel  encoder:  The  channel  encoder  adds  redundancy  to  the  information  bits  obtained 
from  the  source  encoder,  in  order  to  facilitate  error  recovery  after  transmission  over  the  channel. 
It  might  appear  that  we  are  putting  in  too  much  work,  adding  redundancy  just  after  the  source 
encoder  has  removed  it.  However,  the  redundancy  added  by  the  channel  encoder  is  tailored  to 
the  channel  over  which  information  transfer  is  to  occur,  whereas  the  redundancy  in  the  original 
message  signal  is  beyond  our  control,  so  that  it  would  be  inefficient  to  keep  it  when  we  transmit 
the  signal  over  the  channel. 

Example:  The  noise  and  distortion  introduced  by  the  channel  can  cause  errors  in  the  bits  we 
send  over  it.  Consider  the  following  abstraction  for  a  channel:  we  can  send  a  string  of  bits  (zeros 
or  ones)  over  it,  and  the  channel  randomly  flips  each  bit  with  probability  0.01  (i.e.,  the  channel 
has  a  1%  error  rate).  If  we  cannot  tolerate  this  error  rate,  we  could  repeat  each  bit  that  we  wish 
to  send  three  times,  and  use  a  majority  rule  to  decide  on  its  value.  Now,  we  only  make  an  error 
if  two  or  more  of  the  three  bits  are  flipped  by  the  channel.  It  is  left  as  an  exercise  to  calculate 
that  an  error  now  happens  with  probability  approximately  0.0003  (i.e.,  the  error  rate  has  gone 
down  to  0.03%).  That  is,  we  have  improved  performance  by  introducing  redundancy.  Of  course, 
there  are  far  more  sophisticated  and  efficient  techniques  for  introducing  redundancy  than  the 
simple  repetition  strategy  just  described;  see  Chapter  7. 

Modulator:  The  modulator  maps  the  coded  bits  at  the  output  of  the  channel  encoder  to  a 
transmitted  signal  to  be  sent  over  the  channel.  For  example,  we  may  insist  that  the  transmitted 
signal  fit  within  a  given  frequency  band  and  adhere  to  stringent  power  constraints  in  a  wireless 
system,  where  interference  between  users  and  between  co-existing  systems  is  a  major  concern. 
Unlicensed  WiFi  transmissions  typically  occupy  20-40  MHz  of  bandwidth  in  the  2.4  or  5  GHz 
bands.  Transmissions  in  fourth  generation  cellular  systems  may  often  occupy  bandwidths  ranging 
from  1-20  MHz  at  frequencies  ranging  from  700  MHz  to  3  GHz.  While  these  signal  bandwidths 
are  being  increased  in  an  effort  to  increase  data  rates  (e.g.,  up  to  160  GHz  for  emerging  WiFi 
standards,  and  up  to  100  MHz  for  emerging  cellular  standards),  and  new  frequency  bands  are 
being  actively  explored  (see  the  epilogue  for  more  discussion),  the  transmitted  signal  still  needs 
to  be  shaped  to  fit  within  certain  spectral  constraints. 

Example:  Suppose  that  we  send  bit  value  0  by  transmitting  the  signal  s(t),  and  bit  value  1  by 
transmitting  —s(t).  Even  for  this  simple  example,  we  must  design  the  signal  s(t)  so  it  fits  within 
spectral  constraints  (e.g.,  two  different  users  may  use  two  different  segments  of  spectrum  to  avoid 
interfering  with  each  other),  and  we  must  figure  out  how  to  prevent  successive  bits  of  the  same 
user  from  interfering  with  each  other.  For  wireless  communication,  these  signals  are  voltages 
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generated  by  circuits  coupled  to  antennas,  and  are  ultimately  emitted  as  electromagnetic  waves 
from  the  antennas. 

The  channel  encoder  and  modulator  are  typically  jointly  designed,  keeping  in  mind  the  antici¬ 
pated  channel  conditions,  and  the  result  is  termed  a  coded  modulator. 

Channel:  The  channel  distorts  and  adds  noise,  and  possibly  interference,  to  the  transmitted  sig¬ 
nal.  Much  of  our  success  in  developing  communication  technologies  has  resulted  from  being  able 
to  optimize  communication  strategies  based  on  accurate  mathematical  models  for  the  channel. 
Such  models  are  typically  statistical,  and  are  developed  with  significant  effort  using  a  combi¬ 
nation  of  measurement  and  computation.  The  physical  characteristics  of  the  communication 
medium  vary  widely,  and  hence  so  do  the  channel  models.  Wireline  channels  are  typically  well 
modeled  as  linear  and  time-invariant,  while  optical  fiber  channels  exhibit  nonlinearities.  Wireless 
mobile  channels  are  particularly  challenging  because  of  the  time  variations  caused  by  mobility, 
and  due  to  the  potential  for  interference  due  to  the  broadcast  nature  of  the  medium.  The  link 
design  also  depends  on  system-level  characteristics,  such  as  whether  or  not  the  transmitter  has 
feedback  regarding  the  channel,  and  what  strategy  is  used  to  manage  interference. 

Example:  Consider  communication  between  a  cellular  base  station  and  a  mobile  device.  The  elec¬ 
tromagnetic  waves  emitted  by  the  base  station  can  reach  the  mobile’s  antennas  through  multiple 
paths,  including  bounces  off  streets  and  building  surfaces.  The  received  signal  at  the  mobile  can 
be  modeled  as  multiple  copies  of  the  transmitted  signal  with  different  gains  and  delays.  These 
gains  and  delays  change  due  to  mobility,  but  the  rate  of  change  is  often  slow  compared  to  the 
data  rate,  hence  over  short  intervals,  we  can  get  away  with  modeling  the  channel  as  a  linear 
time-invariant  system  that  the  transmitted  signal  goes  through  before  arriving  at  the  receiver. 

Demodulator:  The  demodulator  processes  the  signal  received  from  the  channel  to  produce  bit 
estimates  to  be  fed  to  the  channel  decoder.  It  typically  performs  a  number  of  signal  processing 
tasks,  such  as  synchronization  of  phase,  frequency  and  timing,  and  compensating  for  distortions 
induced  by  the  channel. 

Example:  Consider  the  simplest  possible  channel  model,  where  the  channel  just  adds  noise  to 
the  transmitted  signal.  In  our  earlier  example  of  sending  ±s(t)  to  send  0  or  1,  the  demodulator 
must  guess,  based  on  the  noisy  received  signal,  which  of  these  two  options  is  true.  It  might  make 
a  hard  decision  (e.g.,  guess  that  0  was  sent),  or  hedge  its  bets,  and  make  a  soft  decision,  saying, 
for  example,  that  it  is  80%  sure  that  the  transmitted  bit  is  a  zero.  There  are  a  host  of  other 
aspects  of  demodulation  that  we  have  swept  under  the  rug:  for  example,  before  making  any 
decisions,  the  demodulator  has  to  perform  functions  such  as  synchronization  (making  sure  that 
the  receiver’s  notion  of  time  and  frequency  is  consistent  with  the  transmitter’s)  and  equalization 
(compensating  for  the  distortions  due  to  the  channel). 

Channel  decoder:  The  channel  decoder  processes  the  imperfect  bit  estimates  provided  by 
the  demodulator,  and  exploits  the  controlled  redundancy  introduced  by  the  channel  encoder  to 
estimate  the  information  bits. 

Example:  The  channel  decoder  takes  the  guesses  from  the  demodulator  and  uses  the  redundancies 
in  the  channel  code  to  clean  up  the  decisions.  In  our  simple  example  of  repeating  every  bit  three 
times,  it  might  use  a  majority  rule  to  make  its  final  decision  if  the  demodulator  is  putting  out 
hard  decisions.  For  soft  decisions,  it  might  use  more  sophisticated  combining  rules  with  improved 
performance. 

While  we  have  described  the  demodulator  and  decoder  as  operating  separately  and  in  sequence 
for  simplicity,  there  can  be  significant  benefits  from  iterative  information  exchange  between  the 
two.  In  addition,  for  certain  coded  modulation  strategies  in  which  channel  coding  and  modulation 
are  tightly  coupled,  the  demodulator  and  channel  decoder  may  be  integrated  into  a  single  entity. 

Source  decoder:  The  source  decoder  processes  the  estimated  information  bits  at  the  output 
of  the  channel  decoder  to  obtain  an  estimate  of  the  message.  The  message  format  may  or  may 
not  be  the  same  as  that  of  the  original  message  input  to  the  source  encoder:  for  example,  the 
source  encoder  may  translate  speech  to  text  before  encoding  into  bits,  and  the  source  decoder 
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may  output  a  text  message  to  the  end  user. 

Example:  For  the  example  of  a  digital  image  considered  earlier,  the  compressed  image  can  be 
translated  back  to  a  pixel-by-pixel  representation  by  taking  the  inverse  spatial  Fourier  transform 
of  the  coefficients  that  survived  the  compression. 

We  are  now  ready  to  compare  analog  and  digital  communication,  and  discuss  why  the  trend 
towards  digital  is  inevitable. 


1.1.3  Why  digital? 

Comparing  the  block  diagrams  for  analog  and  digital  communication  in  Figures  1.1  and  1.2, 
respectively,  we  see  that  the  digital  communication  system  involves  far  more  processing.  How¬ 
ever,  this  is  not  an  obstacle  for  modern  transceiver  design,  due  to  the  exponential  increase  in 
the  computational  power  of  low-cost  silicon  integrated  circuits.  Digital  communication  has  the 
following  key  advantages. 

Optimality:  For  a  point-to-point  link,  it  is  optimal  to  separately  optimize  source  coding  and 
channel  coding,  as  long  we  do  not  mind  the  delay  and  processing  incurred  in  doing  so.  Due 
to  this  source- channel  separation  principle,  we  can  leverage  the  best  available  source  codes  and 
the  best  available  channel  codes  in  designing  a  digital  communication  system,  independently 
of  each  other.  Efficient  source  encoders  must  be  highly  specialized.  For  example,  state  of  the 
art  speech  encoders,  video  compression  algorithms,  or  text  compression  algorithms  are  very 
different  from  each  other,  and  are  each  the  result  of  significant  effort  over  many  years  by  a  large 
community  of  researchers.  However,  once  source  encoding  is  performed,  the  coded  modulation 
scheme  used  over  the  communication  link  can  be  engineered  to  transmit  the  information  bits 
reliably,  regardless  of  what  kind  of  source  they  correspond  to,  with  the  bit  rate  limited  only 
by  the  channel  and  transceiver  characteristics.  Thus,  the  design  of  a  digital  communication 
link  is  source-independent  and  channel- optimized.  In  contrast,  the  waveform  transmitted  in  an 
analog  communication  system  depends  on  the  message  signal,  which  is  beyond  the  control  of  the 
link  designer,  hence  we  do  not  have  the  freedom  to  optimize  link  performance  over  all  possible 
communication  schemes.  This  is  not  just  a  theoretical  observation:  in  practice,  huge  performance 
gains  are  obtained  from  switching  from  analog  to  digital  communication. 

Scalability:  While  Figure  1.2  shows  a  single  digital  communication  link  between  source  en¬ 
coder  and  decoder,  under  the  source-channel  separation  principle,  there  is  nothing  preventing 
us  from  inserting  additional  links,  putting  the  source  encoder  and  decoder  at  the  end  points. 
This  is  because  digital  communication  allows  ideal  regeneration  of  the  information  bits,  hence 
every  time  we  add  a  link,  we  can  focus  on  communicating  reliably  over  that  particular  link. 
(Of  course,  information  bits  do  not  always  get  through  reliably,  hence  we  typically  add  error 
recovery  mechanisms  such  as  retransmission,  at  the  level  of  an  individual  link  or  “end-to-end” 
over  a  sequence  of  links  between  the  information  source  and  sink.)  Another  consequence  of  the 
source-channel  separation  principle  is  that,  since  information  bits  are  transported  without  inter¬ 
pretation,  the  same  link  can  be  used  to  carry  multiple  kinds  of  messages.  A  particularly  useful 
approach  is  to  chop  the  information  bits  up  into  discrete  chunks,  or  packets,  which  can  then 
be  processed  independently  on  each  link.  These  properties  of  digital  communication  are  critical 
for  enabling  massively  scalable,  general  purpose,  communication  networks  such  as  the  Internet. 
Such  networks  can  have  large  numbers  of  digital  communication  links,  possibly  with  different 
characteristics,  independently  engineered  to  provide  “bit  pipes”  that  can  support  data  rates. 
Messages  of  various  kinds,  after  source  encoding,  are  reduced  to  packets,  and  these  packets  are 
switched  along  different  paths  along  the  network,  depending  on  the  identities  of  the  source  and 
destination  nodes,  and  the  loads  on  different  links  in  the  network.  None  of  this  would  be  possible 
with  analog  communication:  link  performance  in  an  analog  communication  system  depends  on 
message  properties,  and  successive  links  incur  noise  accumulation,  which  limits  the  number  of 
links  which  can  be  cascaded. 
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The  preceding  makes  it  clear  that  source-channel  separation,  and  the  associated  bit  pipe  abstrac¬ 
tion,  is  crucial  in  the  formation  and  growth  of  modern  communication  networks.  However,  there 
are  some  important  caveats  that  are  worth  noting.  Joint  source-channel  design  can  provide  bet¬ 
ter  performance  in  some  settings,  especially  when  there  are  constraints  on  delay  or  complexity, 
or  if  multiple  users  are  being  supported  simultaneously  on  a  given  communication  medium.  In 
practice,  this  means  that  “local”  violations  of  the  separation  principle  (e.g.,  over  a  wireless  last 
hop  in  a  communication  network)  may  be  a  useful  design  trick.  Similarly,  the  bit  pipe  abstraction 
used  by  network  designers  is  too  simplistic  for  the  design  of  wireless  networks  at  the  edge  of  the 
Internet:  physical  properties  of  the  wireless  channel  such  as  interference,  multipath  propagation 
and  mobility  must  be  taken  into  account  in  network  engineering. 


1.1.4  Why  analog  design  remains  important 

While  we  are  interested  in  transporting  bits  in  digital  communication,  the  physical  link  over 
which  these  bits  are  sent  is  analog.  Thus,  analog  and  mixed  signal  (digital/analog)  design  play 
a  crucial  role  in  modern  digital  communication  systems.  Analog  design  of  digit al-to- analog 
converters,  mixers,  amplifiers  and  antennas  is  required  to  translate  bits  to  physical  waveforms  to 
be  emitted  by  the  transmitter.  At  the  receiver,  analog  design  of  antennas,  amplifiers,  mixers  and 
analog-to-digital  converters  is  required  to  translate  the  physical  received  waveforms  to  digital 
(discrete  valued,  discrete  time)  signals  that  are  amenable  to  the  digital  signal  processing  that 
is  at  the  core  of  modern  transceivers.  Analog  circuit  design  for  communications  is  therefore 
a  thriving  field  in  its  own  right,  which  this  textbook  makes  no  attempt  to  cover.  However, 
the  material  in  Chapter  3  on  analog  communication  techniques  is  intended  to  introduce  digital 
communication  system  designers  to  some  of  the  high-level  issues  addressed  by  analog  circuit 
designers.  The  goal  is  to  establish  enough  of  a  common  language  to  facilitate  interaction  between 
system  and  circuit  designers.  While  much  of  digital  communication  system  design  can  be  carried 
out  by  abstracting  out  the  intervening  analog  design  (as  done  in  Chapters  4  through  8),  closer 
interaction  between  system  and  circuit  designers  becomes  increasingly  important  as  we  push  the 
limits  of  communication  systems,  as  briefly  indicated  in  the  epilogue. 


1.2  A  Technology  Perspective 

We  now  discuss  some  technology  trends  and  concepts  that  have  driven  the  astonishing  growth 
in  communication  systems  in  the  past  two  decades,  and  that  are  expected  to  impact  future 
developments  in  this  area.  Our  discussion  is  structured  in  terms  of  big  technology  “stories.” 

Technology  story  1:  The  Internet.  Some  of  the  key  ingredients  that  contributed  to  its 
growth  and  the  essential  role  it  plays  in  our  lives  are  as  follows: 

•  Any  kind  of  message  can  be  chopped  up  into  packets  and  routed  across  the  network,  using  an 
Internet  Protocol  (IP)  that  is  simple  to  implement  in  software; 

•  Advances  in  optical  fiber  communication  and  high-speed  digital  hardware  enable  a  super-fast 
“core”  of  routers  connected  by  very  high-speed,  long-range  links,  that  enable  world- wide  coverage; 

•  The  World  Wide  Web,  or  web,  makes  it  easy  to  organize  information  into  interlinked  hypertext 
documents  which  can  be  browsed  from  anywhere  in  the  world; 

•  The  digitization  of  content  (audio,  video,  books)  means  that  ultimately  “all”  information  is 
expected  to  be  available  on  the  web; 

•  Search  engines  enable  us  to  efficiently  search  for  this  information; 

•  Connectivity  applications  such  as  email,  teleconferencing,  videoconferencing  and  online  social 
networks  have  become  indispensable  in  our  daily  lives. 

Technology  story  2:  Wireless.  Cellular  mobile  networks  are  everywhere,  and  are  based  on 
the  breakthrough  concept  that  ubiquitous  tetherless  connectivity  can  be  provided  by  breaking 
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the  world  into  cells,  with  “spatial  reuse”  of  precious  spectrum  resources  in  cells  that  are  “far 
enough”  apart.  Base  stations  serve  mobiles  in  their  cells,  and  hand  them  off  to  adjacent  base 
stations  when  the  mobile  moves  to  another  cell.  While  cellular  networks  were  invented  to  support 
voice  calls  for  mobile  users,  today’s  mobile  devices  (e.g.,  “smart  phones”  and  tablet  computers) 
are  actually  powerful  computers  with  displays  large  enough  for  users  to  consume  video  on  the  go. 
Thus,  cellular  networks  must  now  support  seamless  access  to  the  Internet.  The  billions  of  mobile 
devices  in  use  easily  outnumber  desktop  and  laptop  computers,  so  that  the  most  important  parts 
of  the  Internet  today  are  arguably  the  cellular  networks  at  its  edge.  Mobile  service  providers  are 
having  great  difficulty  keeping  up  with  the  increase  in  demand  resulting  from  this  convergence 
of  cellular  and  Internet;  by  some  estimates,  the  capacity  of  cellular  networks  must  be  scaled  up 
by  several  orders  of  magnitude,  at  least  in  densely  populated  urban  areas!  As  discussed  in  the 
epilogue,  a  major  challenge  for  the  communication  researcher  and  technologist,  therefore,  is  to 
come  up  with  the  breakthroughs  required  to  deliver  such  capacity  gains. 

Another  major  success  in  wireless  is  WiFi,  a  catchy  term  for  a  class  of  standardized  wireless 
local  area  network  (WLAN)  technologies  based  on  the  IEEE  802.11  family  of  standards.  Cur¬ 
rently,  WiFi  networks  use  unlicensed  spectrum  in  the  2.4  and  5  GHz  bands,  and  have  come  into 
widespread  use  in  both  residential  and  commercial  environments.  WiFi  transceivers  are  now 
incorporated  into  almost  every  computer  and  mobile  device.  One  way  of  alleviating  the  cellular 
capacity  crunch  that  was  just  mentioned  is  to  offload  Internet  access  to  the  nearest  WiFi  net¬ 
work.  Of  course,  since  different  WiFi  networks  are  often  controlled  by  different  entities,  seamless 
switching  between  cellular  and  WiFi  is  not  always  possible. 

It  is  instructive  to  devote  some  thought  to  the  contrast  between  cellular  and  WiFi  technologies. 
Cellular  transceivers  and  networks  are  far  more  tightly  engineered.  They  employ  spectrum  that 
mobile  operators  pay  a  great  deal  of  money  to  license,  hence  it  is  critical  to  use  this  spectrum 
efficiently.  Furthermore,  cellular  networks  must  provide  robust  wide-area  coverage  in  the  face  of 
rapid  mobility  (e.g.,  automobiles  at  highway  speeds).  In  contrast,  WiFi  uses  unlicensed  (i.e.,  free!) 
spectrum,  must  only  provide  local  coverage,  and  typically  handles  much  slower  mobility  (e.g., 
pedestrian  motion  through  a  home  or  building).  As  a  result,  WiFi  can  be  more  loosely  engineered 
than  cellular.  It  is  interesting  to  note  that  despite  the  deployment  of  many  uncoordinated 
WiFi  networks  in  an  unlicensed  setting,  WiFi  typically  provides  acceptable  performance,  partly 
because  the  relatively  large  amount  of  unlicensed  spectrum  (especially  in  the  5  GHz  band)  allows 
for  channel  switching  when  encountering  excessive  interference,  and  partly  because  of  naturally 
occurring  spatial  reuse  (WiFi  networks  that  are  “far  enough”  from  each  other  do  not  interfere 
with  each  other).  Of  course,  in  densely  populated  urban  environments  with  many  independently 
deployed  WiFi  networks,  the  performance  can  deteriorate  significantly,  a  phenomenon  sometimes 
referred  to  as  a  tragedy  of  the  commons  (individually  selfish  behavior  leading  to  poor  utilization 
of  a  shared  resource).  As  we  briefly  discuss  in  the  epilogue,  both  the  cellular  and  WiFi  design 
paradigms  need  to  evolve  to  meet  our  future  needs. 

Technology  story  3:  Moore’s  law.  Moore’s  “law”  is  actually  an  empirical  observation  at¬ 
tributed  to  Gordon  Moore,  one  of  the  founders  of  Intel  Corporation.  It  can  be  paraphrased  as 
saying  that  the  density  of  transistors  in  an  integrated  circuit,  and  hence  the  amount  of  compu¬ 
tation  per  unit  cost,  can  be  expected  to  increase  exponentially  over  time.  This  observation  has 
become  a  self-fulfilling  prophecy,  because  it  has  been  taken  up  by  the  semiconductor  industry 
as  a  growth  benchmark  driving  their  technology  roadmap.  While  Moore’s  law  may  be  slowing 
down  somewhat,  it  has  already  had  a  spectacular  impact  on  the  communications  industry  by 
drastically  lowering  the  cost  and  increasing  the  speed  of  digital  computation.  By  converting 
analog  signals  to  the  digital  domain  as  soon  as  possible,  advanced  transceiver  algorithms  can 
be  implemented  in  digital  signal  processing  (DSP)  using  low-cost  integrated  circuits,  so  that  re¬ 
search  breakthroughs  in  coding  and  modulation  can  be  quickly  transitioned  into  products.  This 
leads  to  economies  of  scale  that  have  been  critical  to  the  growth  of  mass  market  products  in  both 
wireless  (e.g.,  cellular  and  WiFi)  and  wireline  (e.g.,  cable  modems  and  DSL)  communication. 
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Internet 


Figure  1.3:  The  Internet  has  a  core  of  routers  and  servers  connected  by  high-speed  fiber  links, 
with  wireless  networks  hanging  off  the  edge  (figure  courtesy  Aseern  Wadhwa). 


How  do  these  stories  come  together?  The  sketch  in  Figure  1.3  highlights  key  building  blocks 
of  the  Internet  today.  The  core  of  the  network  consists  of  powerful  routers  that  direct  packets 
of  data  from  an  incoming  edge  to  an  outgoing  edge,  and  servers  (often  housed  in  large  data 
centers )  that  serve  up  content  requested  by  clients  such  as  personal  computers  and  mobile  devices. 
The  elements  in  the  core  network  are  connected  by  high-speed  optical  fiber.  Wireless  can  be 
viewed  as  hanging  off  the  edge  of  the  Internet.  Wide  area  cellular  networks  may  have  worldwide 
coverage,  but  each  base  station  is  typically  connected  by  a  high-speed  link  to  the  wired  Internet. 
WiFi  networks  are  wireless  local  area  networks,  typically  deployed  indoors  (but  potentially  also 
providing  outdoor  coverage  for  low-mobility  scenarios)  in  homes  and  office  buildings,  connected  to 
the  Internet  via  last  mile  links,  which  might  run  over  copper  wires  (a  legacy  of  wired  telephony, 
with  transceivers  typically  upgraded  to  support  broadband  Internet  access)  or  coaxial  cable 
(originally  deployed  to  deliver  cable  television,  but  now  also  providing  broadband  Internet  access). 
Some  areas  have  been  upgraded  to  optical  fiber  to  the  curb  or  even  to  the  home,  while  some 
others  might  be  remote  enough  to  require  wireless  last  mile  solutions. 


Figure  1.4:  A  segment  of  a  cellular  network  with  idealized  hexagonal  shapes  (figure  courtesy 
Aseem  Wadhwa). 
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Zooming  in  now  on  cellular  networks,  Figure  1.4  shows  three  adjacent  cells  in  a  cellular  network 
with  hexagonal  cells.  A  working  definition  of  a  cell  is  that  it  is  the  area  around  a  base  station 
where  the  signal  strength  is  higher  than  that  from  other  base  stations.  Of  course,  under  realistic 
propagation  conditions,  cells  are  never  hexagonal,  but  the  concept  of  spatial  reuse  still  holds:  the 
interference  between  distant  cells  can  be  neglected,  hence  they  can  use  the  same  communication 
resources.  For  example,  in  Figure  1.4,  we  might  decide  to  use  three  different  frequency  bands 
in  the  three  cells  shown,  but  might  then  reuse  these  bands  in  other  cells.  Figure  1.4  also  shows 
that  a  user  may  be  simultaneously  in  range  of  multiple  base  stations  when  near  cell  boundaries. 
Crossing  these  boundaries  may  result  in  a  handoff  from  one  base  station  to  another.  In  addition, 
near  cell  boundaries,  a  mobile  device  may  be  in  communication  with  multiple  base  stations 
simultaneously,  a  concept  known  as  soft  handoff. 

It  is  useful  for  a  communication  system  designer  to  be  aware  of  the  preceding  “big  picture”  of 
technology  trends  and  network  architectures  in  order  to  understand  how  to  direct  his  or  her 
talents  as  these  systems  continue  to  evolve  (the  epilogue  contains  more  detailed  speculation 
regarding  this  evolution).  However,  the  first  order  of  business  is  to  acquire  the  fundamentals 
required  to  get  going  in  this  field.  These  are  quite  simply  stated:  a  communication  system 
designer  must  be  comfortable  with  mathematical  modeling  (in  order  to  understand  the  state  of 
the  art,  as  well  as  to  devise  new  models  as  required),  and  with  devising  and  evaluating  signal 
processing  algorithms  based  on  these  models.  The  goal  of  this  textbook  is  to  provide  a  first 
exposure  to  such  a  technical  background. 


1.3  Scope  of  this  Textbook 

Referring  to  the  block  diagram  of  a  digital  communication  system  in  Figure  1.2,  our  focus  in 
this  textbook  is  to  provide  an  introduction  to  design  of  a  digital  communication  link  as  shown 
inside  the  dashed  box.  While  we  are  primarily  interested  in  digital  communication,  circuit  de¬ 
signers  implementing  such  systems  must  deal  with  analog  waveforms,  hence  we  believe  that  a 
rudimentary  background  in  analog  communication  techniques,  as  provided  in  this  book,  is  useful 
for  the  communication  system  designer.  We  do  not  discuss  source  encoding  and  decoding  in 
this  book;  these  topics  are  highly  specialized  and  technical,  and  doing  them  justice  requires  an 
entire  textbook  of  its  own  at  the  graduate  level.  A  detailed  outline  of  the  book  is  provided  in 
the  preface,  hence  we  restrict  ourselves  here  to  summarizing  the  roles  of  the  various  chapters: 
Chapter  2:  introduces  the  signal  processing  background  required  for  DSP-centric  implementa¬ 
tions  of  communication  transceivers; 

Chapter  3:  provides  just  enough  background  on  analog  communication  techniques  (can  be 
skipped  if  only  focused  on  digital  communication); 

Chapter  f:  discusses  digital  modulation  techniques; 

Chapter  5:  provides  the  probability  background  required  for  receiver  design,  including  noise 
modeling; 

Chapter  6:  discusses  design  and  performance  analysis  of  demodulators  in  digital  communication 
systems  for  idealized  link  models; 

Chapter  1:  provides  an  initial  exposure  to  channel  coding  techniques  and  benchmarks; 

Chapter  8:  provides  an  introduction  to  approaches  for  handling  channel  dispersion,  and  to  mul¬ 
tiple  antenna  communication; 

Epilogue:  discusses  emerging  trends  shaping  research  and  development  in  communications. 

Chapters  2,  4  and  6  are  core  material  that  must  be  mastered  (much  of  Chapter  5  is  also  core 
material,  but  some  readers  may  already  have  enough  probability  background  that  they  can  skip, 
or  skim,  it).  Chapter  3  is  highly  recommended  for  communication  system  designers  with  interest 
in  radio  frequency  circuit  design,  since  it  highlights,  at  a  high  level,  some  of  the  ideas  and  issues 
that  come  up  there.  Chapters  7  and  8  are  independent  of  each  other,  and  contain  more  advanced 
material  that  may  not  always  fit  within  an  undergraduate  curriculum.  They  contain  “hands-on" 
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introductions  to  these  topics  via  code  fragments  and  software  labs  that  hopefully  encourage  the 
reader  to  explore  further. 


1.4  Why  Study  Communication  Systems? 

Before  launching  into  our  formal  study,  it  makes  sense  to  ask  why  the  material  in  this  textbook 
is  worth  studying.  There  are  several  obvious  answers  to  this  question.  The  indispensable  role 
of  communications  in  modern  life,  and  the  success  of  the  communications  industry,  implies  that 
a  solid  understanding  of  this  material  constitutes  a  valuable  skill  set.  The  vibrant  future  of 
communications  (see  the  epilogue)  ensures  the  continuing  value  of  this  skill  set  for  many  decades 
to  come,  ffowever,  there  is  also  an  indirect,  and  perhaps  more  fundamental,  answer  to  this 
question.  The  design  of  communication  systems  today  represents  a  triumph  of  mathematical 
modeling  and  statistical  signal  processing.  Detailed,  hands-on  experience  building  confidence  in 
such  techniques  is  therefore  excellent  preparation  for  tackling  more  complex  systems  for  which 
complete  mathematical  models  might  not  be  available,  as  the  author  has  discovered  in  his  own 
research.  Examples  of  such  systems  include  the  Internet  itself,  online  social  networks  running  on 
the  Internet,  financial  systems  exhibiting  a  complex  web  of  interdependencies,  as  well  as  signal 
processing,  inference  and  machine  learning  techniques  for  the  huge  volumes  of  data  (“big  data”) 
being  generated  in  a  host  of  other  applications. 


1.5  Concept  Summary 

The  goal  of  this  chapter  is  to  provide  an  intellectual  framework  and  motivation  for  the  rest  of 
this  textbook.  Some  of  the  key  concepts  are  as  follows. 

•  Communication  refers  to  information  transfer  across  either  space  or  time,  where  the  latter 
refers  to  storage  media. 

•  Signals  carrying  information  and  signals  that  can  be  sent  over  a  communication  medium  are 
both  inherently  analog  (i.e.,  continuous-time,  continuous- valued). 

•  Analog  communication  corresponds  to  transforming  an  analog  message  waveform  directly  into 
an  analog  transmitted  waveform  at  the  transmitter,  and  undoing  this  transformation  at  the 
receiver. 

•  Digital  communication  corresponds  to  first  reducing  message  waveforms  to  information  bits, 
and  then  transporting  these  bits  over  the  communication  channel. 

•  Digital  communication  requires  the  following  steps:  source  encoding  and  decoding,  modulation 
and  demodulation,  channel  encoding  and  decoding. 

•  While  digital  communication  requires  more  processing  steps  than  analog  communication,  it  has 
the  advantages  of  optimality  and  scalability,  hence  there  is  an  unstoppable  trend  from  analog  to 
digital. 

•  The  growth  in  communication  has  been  driven  by  major  technology  stories  including  the 
Internet,  wireless  and  Moore’s  law. 

•  Key  components  of  the  communication  system  designer’s  toolbox  are  mathematical  modeling 
and  signal  processing. 


1.6  Endnotes 

There  are  a  large  number  of  textbooks  on  communication  systems  at  both  the  undergraduate  and 
graduate  level.  Undergraduate  texts  include  Haykin  [1],  Proakis  and  Salehi  [2],  Pursley  [3],  and 
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Ziemer  and  Tranter  [4],  Graduate  texts,  which  typically  focus  on  digital  communication  include 
Barry,  Lee  and  Messerschmitt  [5],  Benedetto  and  Biglieri  [6],  Madhow  [7],  and  Proakis  and  Salehi 
[8].  The  first  coherent  exposition  of  the  modern  theory  of  communication  receiver  design  is  in 
the  classical  (graduate  level)  textbook  by  Wozencraft  and  Jacobs  [9].  Other  important  classical 
graduate  level  texts  are  Viterbi  and  Omura  [10]  and  Blahut  [11],  More  specialized  references  (e.g., 
on  signal  processing,  information  theory,  channel  coding,  wireless  communication)  are  mentioned 
in  later  chapters.  In  addition  to  these  textbooks,  an  overview  of  many  important  topics  can  be 
found  in  the  recently  updated  mobile  communications  handbook  [12]  edited  by  Gibson. 

This  book  is  intended  to  be  accessible  to  readers  who  have  never  been  exposed  to  communication 
systems  before.  It  has  some  overlap  with  more  advanced  graduate  texts  (e.g.,  Chapters  2,  4,  5 
and  6  here  overlap  heavily  with  Chapters  2  and  3  in  the  author’s  own  graduate  text  [7]),  and 
provides  the  technical  background  and  motivation  required  to  easily  access  these  more  advanced 
texts.  Of  course,  the  best  way  to  continue  building  expertise  in  the  field  is  by  actually  working 
in  it.  Research  and  development  in  this  field  requires  study  of  the  research  literature,  of  more 
specialized  texts  (e.g.,  on  information  theory,  channel  coding,  synchronization),  and  of  commer¬ 
cial  standards.  The  Institute  for  Electrical  and  Electronics  Engineers  (IEEE)  is  responsible  for 
publication  of  many  conference  proceedings  and  journals  in  communications:  major  conferences 
include  IEEE  Global  Telecommunications  Conference  (Globecom),  IEEE  International  Com¬ 
munications  Conference  (ICC),  major  journals  and  magazines  include  IEEE  Communications 
Magazine,  IEEE  Transactions  on  Communications,  IEEE  Journal  on  Selected  Areas  in  Commu¬ 
nications.  Closely  related  fields  such  as  information  theory  and  signal  processing  have  their  own 
conferences,  journals  and  magazines.  Major  conferences  include  the  IEEE  International  Sympo¬ 
sium  on  Information  Theory  (ISIT)  and  IEEE  International  Conference  on  Acoustics,  Speech  and 
Signal  Processing  (ICASSP),  journals  include  the  IEEE  Transactions  on  Information  Theory  and 
the  IEEE  Transactions  on  Signal  Processing.  The  IEEE  also  publishes  a  number  of  standards 
online,  such  as  the  IEEE  802  family  of  standards  for  local  area  networks. 

A  useful  resource  for  learning  source  coding  and  data  compression,  which  are  not  discussed  in 
this  text,  is  the  textbook  by  Sayood  [13].  Textbooks  on  core  concepts  in  communication  networks 
include  Bertsekas  and  Gallager  [14],  Kumar,  Manjunath  and  Kuri  [15],  and  Walrand  and  Varaiya 
[16], 
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Chapter  2 

Signals  and  Systems 


A  communication  link  involves  several  stages  of  signal  manipulation:  the  transmitter  transforms 
the  message  into  a  signal  that  can  be  sent  over  a  communication  channel;  the  channel  distorts 
the  signal  and  adds  noise  to  it;  and  the  receiver  processes  the  noisy  received  signal  to  extract 
the  message.  Thus,  communication  systems  design  must  be  based  on  a  sound  understanding  of 
signals,  and  the  systems  that  shape  them.  In  this  chapter,  we  discuss  concepts  and  terminology 
from  signals  and  systems,  with  a  focus  on  how  we  plan  to  apply  them  in  our  discussion  of 
communication  systems.  Much  of  this  chapter  is  a  review  of  concepts  with  which  the  reader 
might  already  be  familiar  from  prior  exposure  to  signals  and  systems.  However,  special  attention 
should  be  paid  to  the  discussion  of  baseband  and  passband  signals  and  systems  (Sections  2.7 
and  2.8).  This  material,  which  is  crucial  for  our  purpose,  is  typically  not  emphasized  in  a  first 
course  on  signals  and  systems.  Additional  material  on  the  geometric  relationship  between  signals 
is  covered  in  later  chapters,  when  we  discuss  digital  communication. 

Chapter  Plan:  After  a  review  of  complex  numbers  and  complex  arithmetic  in  Section  2.1,  we 
provide  some  examples  of  useful  signals  in  Section  2.2.  We  then  discuss  LTI  systems  and  convolu¬ 
tion  in  Section  2.3.  This  is  followed  by  Fourier  series  (Section  2.4)  and  Fourier  transform  (Section 
2.5).  These  sections  (Sections  2.1  through  Section  2.5)  correspond  to  a  review  of  material  that 
is  part  of  the  assumed  background  for  the  core  content  of  this  textbook.  However,  even  readers 
familiar  with  the  material  are  encouraged  to  skim  through  it  quickly  in  order  to  gain  familiarity 
with  the  notation.  This  gets  us  to  the  point  where  we  can  classify  signals  and  systems  based 
on  the  frequency  band  they  occupy.  Specifically,  we  discuss  baseband  and  passband  signals  and 
systems  in  Sections  2.7  and  2.8.  Messages  are  typically  baseband,  while  signals  sent  over  channels 
(especially  radio  channels)  are  typically  passband.  We  discuss  methods  for  going  from  baseband 
to  passband  and  back.  We  specifically  emphasize  the  fact  that  a  real- valued  passband  signal  is 
equivalent  (in  a  mathematically  convenient  and  physically  meaningful  sense)  to  a  complex-valued 
baseband  signal,  called  the  complex  baseband  representation,  or  complex  envelope,  of  the  pass- 
band  signal.  We  note  that  the  information  carried  by  a  passband  signal  resides  in  its  complex 
envelope,  so  that  modulation  (or  the  process  of  encoding  messages  in  waveforms  that  can  be 
sent  over  physical  channels)  consists  of  mapping  information  into  a  complex  envelope,  and  then 
converting  this  complex  envelope  into  a  passband  signal.  We  discuss  the  physical  significance 
of  the  rectangular  form  of  the  complex  envelope,  which  corresponds  to  the  in-phase  (I)  and 
quadrature  (Q)  components  of  the  passband  signal,  and  that  of  the  polar  form  of  the  complex 
envelope,  which  corresponds  to  the  envelope  and  phase  of  the  passband  signal.  We  conclude  by 
discussing  the  role  of  complex  baseband  in  transceiver  implementations,  and  by  illustrating  its 
use  for  wireless  channel  modeling. 

Software:  The  software  labs  in  this  chapter  introduce  the  use  of  Matlab  for  signal  processing. 
They  provide  practice  in  writing  Matlab  code  from  scratch  (i.e. ,  without  using  prepackaged 
routines  or  Simulink)  for  simple  computations.  Software  Lab  2.0  is  an  introduction  to  the  use  of 
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Matlab  for  typical  operations  of  interest  to  us,  and  illustrates  how  we  approximate  continuous 
time  operations  in  discrete  time.  Software  Lab  2.1  shows  how  to  model  and  undo  the  effects  of 
carrier  phase  offsets  in  complex  baseband.  Software  Lab  2.2  develops  complex  baseband  models 
for  wireless  multipath  channels,  and  explores  the  phenomenon  of  signal  fading  due  to  constructive 
and  destructive  interference  between  the  paths. 


2.1  Complex  Numbers 


Re(z) 


Figure  2.1:  A  complex  number  z  represented  in  the  two-dimensional  real  plane. 


A  complex  number  z  can  be  written  as  z  =  x+jy ,  where  x  and  y  are  real  numbers,  and  j  =  y/—l. 
We  say  that  x  =  Re(A)  is  the  real  part  of  £  and  y  =  Im(A)  is  the  imaginary  part  of  z.  As  depicted 
in  Figure  2.1,  it  is  often  advantageous  to  interpret  the  complex  number  z  as  a  two-dimensional 
real  vector,  which  can  be  represented  in  rectangular  form  as  (x,  y)  =  (Re(z),  Im(z)),  or  in  polar 
form  (r,  9)  as 

r  —  \z\  —  \J  x2  +  y2 
6  =  iz  =  tan-1  - 

We  can  go  back  from  polar  form  to  rectangular  form  as  follows: 

x  =  rcos9,  y  =  rshi6 


(2.1) 

(2.2) 


Complex  conjugation:  For  a  complex  number  z  =  x  +  jy  =  re-70,  its  complex  conjugate 

z*  =  x  —  jy  =  re~je  (2.3) 

That  is, 

Re  (A")  =  Re(z)  ,  lm{z*)  =  -Im(z)  (2  ^ 

\z*\  =  \z\  ,  h*  -  —iz  ' 


The  real  and  imaginary  parts  of  a  complex  number  z  can  be  written  in  terms  of  z  and  z*  as 
follows: 

Re(.)  =  ,  Im(,)  =  AZa!  (2.5) 

Euler’s  formula:  This  formula  is  of  fundamental  importance  in  complex  analysis,  and  relates 
the  rectangular  and  polar  forms  of  a  complex  number: 

e2°  =  cos  9  +  j  sin  9 
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(2.6) 


The  complex  conjugate  of  e^e  is  given  by 

=  (e70)  *  =  cos  9  —  j  sin  9 

We  can  express  cosines  and  sines  in  terms  of  eje  and  its  complex  conjugate  as  follows: 


P 3 ®  p  3® 

Re  (e?0)  =  - - —  =  cos  9  ,  Im  (e?e)  = 


eje  -  e~je 


2  j 


=  sin  6 


(2.7) 


Applying  Euler’s  formula  to  (2.1),  we  can  write 

z  =  x  +  jy  =  r  cos  9  +  jr  sin  9  =  re^e  (2-8) 

Being  able  to  go  back  and  forth  between  the  rectangular  and  polar  forms  of  a  complex  number 
is  useful.  For  example,  it  is  easier  to  add  in  the  rectangular  form,  but  it  is  easier  to  multiply  in 
the  polar  form. 

Complex  Addition:  For  two  complex  numbers  Z\  =  X\  +  jy1  and  Z2  =  X2  +  jy2, 

Zi  +  Z2  =  (xi  +  X2)  +  j  (2/1  +  y2)  (2-9) 

That  is, 

Re(zi  +  z2)  =  Re (21)  +  Re (22)  ,  Im(2i  +  z2)  =  Im(2i)  +  Im(22)  (2.10) 

Complex  Multiplication  (rectangular  form):  For  two  complex  numbers  z±  =  X\  +  jyi  and 

22  =  x2  +  jy2, 

2122  =  (X1X2  -  2/12/2)  +  j(yix2  +  Xiy2)  (2.11) 

This  follows  simply  by  multiplying  out,  and  setting  j 2  =  —1.  We  have 

Re(2i22)  =  Re(2i)Re(22)  —  1111(21)1111(22)  ,  1111(2122)  =  I111  (21) Re (22)  +  Re (21)1111(22)  (2.12) 

Note  that,  using  the  rectangular  form,  a  single  complex  multiplication  requires  four  real  multi¬ 
plications. 

Complex  Multiplication  (polar  form):  Complex  multiplication  is  easier  when  the  numbers 
are  expressed  in  polar  form.  For  21  =  r\e3°' ,  22  =  r^e-702,  we  have 

2122  =  nr2eJ'(ei+02)  (2.13) 

That  is, 

\ZlZ2\  =  l^lH^I  ,  /Z\Z2  =  /2i  +  /22  (2-14) 

Division:  For  two  complex  numbers  z\  —  X\-\-  jy\  =  r^01  and  z2  =  x2  +  jy2  —  T2e^02  (with 
22  7^  0,  i.e.,  r2  >  0),  it  is  easiest  to  express  the  result  of  division  in  polar  form: 

zi/z2  =  (r1/r2)ej('ei~e2)  (2.15) 

That  is, 

W/z2\  =  |2i|/|22|  ,  /z1/z2  =  [Zi- IZ2  (2.16) 

In  order  to  divide  using  rectangular  form,  it  is  convenient  to  multiply  numerator  and  denominator 
by  Z21  which  gives 


21/22  =  2122/(2222)  =  2122/I22I2 
Multiplying  out  as  usual,  we  get 


(si  +  jyi){x2  -  jy2) 
A  +  y2 


Zl/z2 


(X1X2  +  yrn)  +  j  (-Xiy2  +  yix2) 

A  +  y2 
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(2.17) 


Example  2.1.1  (Computations  with  complex  numbers)  Consider  the  complex  numbers 

Zi  =  1  +  j  and  z2  =  2e-J7r//6.  Find  Z\  +  z2:  Z\z2 ,  and  Z\/ z2.  Also  specify  z*,  z%- 

For  complex  addition,  it  is  convenient  to  express  both  numbers  in  rectangular  form.  Thus, 

z2  =  2  (cos(— 7t/6)  +  j  sin(— 7t/6))  =  Vs  —  j 


and 

Z\  -\-  Z2  =  (1  -\r  j )  +  (\/3  —  j )  =  VS  +  1 

For  complex  multiplication  and  division,  it  is  convenient  to  express  both  numbers  in  polar  form. 
We  obtain  z\  =  v/2e-?7r/4  by  applying  (2.1).  Now,  from  (2.11),  we  have 


z1  z2  =  V2ejn/42e~jn/6  =  2V2ej^/4~*/6)  =  2y/2ejn/u 


Similarly, 


Z1/Z2  = 


V2ejn/4 


=  —= e- 


j(7r/4+7r/6)  _ 


j"5ir/12 


2e-^/6  vd T  y/2 

Multiplication  using  the  rectangular  forms  of  the  complex  numbers  yields  the  following: 

z2  =  (1  +  j)(V 3  -  j)  =  Vs  -  j  +  Vs j  +  1  =  (Vs  +  1  j  +  j  ^\/3  -  lj 

Note  that  z\  =  1  —  j  =  V^eT^^  and  z\  =  2eJ7r//6  =  \[S  +  j.  Division  using  rectangular  forms 
gives 

ZV  z2  =  zi  z2  / 1  ^"2 1 2  =  (1  +  j)(Vs  +  j)/22  = - - - F  j - - - 


No  need  to  memorize  trigonometric  identities  any  more:  Once  we  can  do  computations 
using  complex  numbers,  we  can  use  Euler’s  formula  to  quickly  derive  well-known  trigonometric 
identities  involving  sines  and  cosines.  For  example, 


cos (6 1  +  d2)  =  Re  (eJ'^1+02^) 


But 


e.7'(0i+02)  _  e30ie302  —  (^cos  q1  _|_  j  sjn  6^)  (^cos  q2  _|_  j  sin  q2 ) 

=  (cos  9\  cos  92  —  sin  9 1  sin  92)  +  j  (cos  9\  sin  92  +  sin  9\  cos  92) 


Taking  the  real  part,  we  can  read  off  the  identity 


cos(6,i  +  92)  =  cos  9\  cos  92  —  sin  9\  sin  92  (2.18) 

Moreover,  taking  the  imaginary  part,  we  can  read  off 

sin(6l1  +  92)  =  cos  9\  sin  92  +  sin  9i  cos  92  (2-19) 


2.2  Signals 

Signal:  A  signal  s(t)  is  a  function  of  time  (or  some  other  independent  variable,  such  as  fre¬ 
quency,  or  spatial  coordinates)  which  has  an  interesting  physical  interpretation.  For  example,  it 
is  generated  by  a  transmitter,  or  processed  by  a  receiver.  While  physically  realizable  signals  such 
as  those  sent  over  a  wire  or  over  the  air  must  take  real  values,  we  shall  see  that  it  is  extremely 
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useful  (and  physically  meaningful)  to  consider  a  pair  of  real-valued  signals,  interpreted  as  the 
real  and  imaginary  parts  of  a  complex-valued  signal.  Thus,  in  general,  we  allow  signals  to  take 
complex  values. 

Discrete  versus  Continuous  Time:  We  generically  use  the  notation  x(t)  to  denote  continuous 
time  signals  (t  taking  real  values),  and  x[n]  to  denote  discrete  time  signals  (n  taking  integer 
values).  A  continuous  time  signal  x(t)  sampled  at  rate  Ts  produces  discrete  time  samples  x(nTs  + 
t0)  (to  an  arbitrary  offset),  which  we  often  denote  as  a  discrete  time  signal  x[n].  While  signals 
sent  over  a  physical  communication  channel  are  inherently  continuous  time,  implementations  at 
both  the  transmitter  and  receiver  make  heavy  use  of  discrete  time  implementations  on  digitized 
samples  corresponding  to  the  analog  continuous  time  waveforms  of  interest. 

We  now  introduce  some  signals  that  recur  often  in  this  text. 

Sinusoid:  This  is  a  periodic  function  of  time  of  the  form 

s(t)  =  A  cos(27r/0t  +  9)  (2.20) 

where  A  >  0  is  the  amplitude,  /o  is  the  frequency,  and  9  6  [0,  27t]  is  the  phase.  By  setting  9  =  0, 
we  obtain  a  pure  cosine  Acos2nfct ,  and  by  setting  9  =  —  we  obtain  a  pure  sine  A  sin  2nfct. 
In  general,  using  (2.18),  we  can  rewrite  (2.20)  as 

s(t)  =  Ac  cos  2n  f0t  —  As  sin  2nf0t  (2-21) 

where  Ac  =  A  cos  9  and  As  =  A  sin#  are  real  numbers.  Using  Euler’s  formula,  we  can  write 

Aeje  =  Ac  +  ;jAs  (2.22) 

Thus,  the  parameters  of  a  sinusoid  at  frequency  /0  can  be  represented  by  the  complex  number  in 
(2.22),  with  (2.20)  using  the  polar  form,  and  (2.21)  the  rectangular  form,  of  this  number.  Note 
that  A  =  \J A?c  +  A2S  and  9  =  tan-1 

Clearly,  sinusoids  with  known  amplitude,  phase  and  frequency  are  perfectly  predictable,  and 
hence  cannot  carry  any  information.  As  we  shall  see,  information  can  be  transmitted  by  making 
the  complex  number  Ae^°  =  Ac  +  j As  associated  with  the  parameters  of  sinusoid  vary  in  a  way 
that  depends  on  the  message  to  be  conveyed.  Of  course,  once  this  is  done,  the  resulting  signal 
will  no  longer  be  a  pure  sinusoid,  and  part  of  the  work  of  the  communication  system  designer  is 
to  decide  what  shape  such  a  signal  should  take  in  the  frequency  domain. 

We  now  define  complex  exponentials,  which  play  a  key  role  in  understanding  signals  and  systems 
in  the  frequency  domain. 

Complex  exponential:  A  complex  exponential  at  a  frequency  /0  is  defined  as 

s(t)  =  Aej{2nfot+e)  =  aej27Tfot  (2.23) 

where  A  >  0  is  the  amplitude,  /o  is  the  frequency,  9  e  [0,  27r]  is  the  phase,  and  a  =  Ae ^  is  a 
complex  number  that  contains  both  the  amplitude  and  phase  information.  Let  us  now  make 
three  observations.  First,  note  the  ease  with  which  we  handle  amplitude  and  phase  for  complex 
exponentials:  they  simply  combine  into  a  complex  number  that  factors  out  of  the  complex 
exponential.  Second,  by  Euler’s  formula, 

Re  [Ae^Sot+e))  =  Acos(2vr/0t  +  9) 

so  that  real-valued  sinusoids  are  “contained  in”  complex  exponentials.  Third,  as  we  shall  soon 
see,  the  set  of  complex  exponentials  {e/271^},  where  /  takes  values  in  (— oo,oo),  form  a  “basis” 
for  a  large  class  of  signals  (basically,  for  all  signals  that  are  of  interest  to  us),  and  the  Fourier 
transform  of  a  signal  is  simply  its  expansion  with  respect  to  this  basis.  Such  observations  are 
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a  — >  0  in  the 


1 

/a 

-a/2 

a/2 

Figure  2.2:  The  impulse  function  may  be  viewed  as  a  limit  of  tall  thin  pulses  ( 
examples  shown  in  the  figure). 


Unit  area 


Figure  2.3:  Multiplying  a  signal  with  a  tall  thin  pulse  to  select  its  value  at  to- 


key  to  why  complex  exponentials  play  such  an  important  role  in  signals  and  systems  in  general, 
and  in  communication  systems  in  particular. 

The  Delta,  or  Impulse,  Function:  Another  signal  that  plays  a  crucial  role  in  signals  and  sys¬ 
tems  is  the  delta  function,  or  the  unit  impulse,  which  we  denote  by  5(t).  Physically,  we  can  think 
of  it  as  a  narrow,  tall  pulse  with  unit  area:  examples  are  shown  in  Figure  2.2.  Mathematically, 
we  can  think  of  it  as  a  limit  of  such  pulses  as  the  pulse  width  shrinks  (and  hence  the  pulse  height 
goes  to  infinity).  Such  a  limit  is  not  physically  realizable,  but  it  serves  a  very  useful  purpose  in 
terms  of  understanding  the  structure  of  physically  realizable  signals.  That  is,  consider  a  signal 
s(t)  that  varies  smoothly,  and  multiply  it  with  a  tall,  thin  pulse  of  unit  area,  centered  at  time 
to ,  as  shown  in  Figure  2.3.  If  we  now  integrate  the  product,  we  obtain 

/oo  rto~\~a2  pto+ai 

s(t)p(t)dt  =  I  s(t)p(t)dt  ~  s(to)  /  p(t)dt  =  s(t0 ) 

-oo  J  to—ai  Jto—ai 

That  is,  the  preceding  operation  “selects”  the  value  of  the  signal  at  time  to .  Taking  the  limit  of 
the  tall  thin  pulse  as  its  width  cp  +  a2  — *  0,  we  get  a  translated  version  of  the  delta  function, 
namely,  5(t  —  t0 ).  Note  that  the  exact  shape  of  the  pulse  does  not  matter  in  the  preceding 
argument.  The  delta  function  is  therefore  defined  by  means  of  the  following  sifting  property:  for 
any  “smooth”  function  s(t),  we  have 


s(t)S(t  —  t0)dt 


s(t0)  Sifting  property  of  the  impulse 


(2.24) 


Thus,  the  delta  function  is  defined  mathematically  by  the  way  it  acts  on  other  signals,  rather 
than  as  a  signal  by  itself.  However,  it  is  also  important  to  keep  in  mind  its  intuitive  interpretation 
as  (the  limit  of)  a  tall,  thin,  pulse  of  unit  area. 
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The  following  function  is  useful  for  expressing  signals  compactly. 

Indicator  function:  We  use  I  a  to  denote  the  indicator  function  of  a  set  A,  defined  as 

1,  xeA 
0,  otherwise 

The  indicator  function  of  an  interval  is  a  rectangular  pulse,  as  shown  in  Figure  2.4. 


Ia{x)  = 


I  .  .00 

[a,bl 


a  b 


Figure  2.4:  The  indicator  function  of  an  interval  is  a  rectangular  pulse. 


Figure  2.5:  The  functions  u(t )  =  2(1  —  |f |)/[_i.i](t)  and  v(t)  =  3/[_iio](t)  +  /[o,i](t)  —  Z[i,2](t)  can 
be  written  compactly  in  terms  of  indicator  functions. 


The  indicator  function  can  also  be  used  to  compactly  express  more  complex  signals,  as  shown  in 
the  examples  in  Figure  2.5. 

Sine  function:  The  sine  function,  plotted  in  Figure  2.6,  is  defined  as 

sin(7nr 
TTX 

where  the  value  at  x  =  0  is  defined  as  the  limit  as  x  — >  0  to  be  sinc(0)  =  1.  Since  |  sin(7rx)|  <  1, 
we  have  that  |sinc(a:)|  <  with  equality  if  and  only  if  x  is  an  odd  multiple  of  1/2.  That  is, 

the  sine  function  exhibits  a  sinusoidal  variation,  with  an  envelope  that  decays  as 

The  analogy  between  signals  and  vectors:  Even  though  signals  can  be  complicated  functions 
of  time  that  live  in  an  infinite-dimensional  space,  the  mathematics  for  manipulating  them  are 
very  similar  to  those  for  manipulating  finite- dimensional  vectors,  with  sums  replaced  by  integrals. 
A  key  building  block  of  communication  theory  is  the  relative  geometry  of  the  signals  used,  which 
is  governed  by  the  inner  products  between  signals.  Inner  products  for  continuous-time  signals  can 
be  defined  in  a  manner  exactly  analogous  to  the  corresponding  definitions  in  finite- dimensional 
vector  space. 

Inner  Product:  The  inner  product  for  two  m  x  1  complex  vectors  s  =  (s[l], ...,  s[m])T  and 
r  =  (r[l],  ...,r[m])T  is  given  by 

m 

(s,  r)  =  ^  s[i]r*[i]  =  r^s  (2.25) 

i= 1 
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Figure  2.6:  The  sine  function. 


Similarly,  we  define  the  inner  product  of  two  (possibly  comp  lex- valued)  signals  s(t)  and  r(t)  as 
follows: 


( s,r )  =  /  s(t)r*(t )  dt 


(2.26) 


J  —  OO 

The  inner  product  obeys  the  following  linearity  properties: 


(aisi  +  a2s2,  r)  =  ai(si,  r)  +  a2(s2,r) 

(s,  a\r i  +  a2r2)  =  a{{s,  n)  +  a*2(s,  r2) 

where  (i\ ,  a2  are  complex- valued  constants,  and  s,  sy,  s2,  r,  r \ ,  r2  are  signals  (or  vectors).  The 
complex  conjugation  when  we  pull  out  constants  from  the  second  argument  of  the  inner  product 
is  something  that  we  need  to  maintain  awareness  of  when  computing  inner  products  for  complex¬ 
valued  signals. 

Energy  and  Norm:  The  energy  Es  of  a  signal  s  is  defined  as  its  inner  product  with  itself: 

/OO 

\s(t)\2dt 

-OO 

where  ||s||  denotes  the  norm  of  s.  If  the  energy  of  s  is  zero,  then  s  must  be  zero  “almost 
everywhere”  (e.g.,  s(t )  cannot  be  nonzero  over  any  interval,  no  matter  how  small  its  length). 
For  continuous-time  signals,  we  take  this  to  be  equivalent  to  being  zero  everywhere.  With  this 
understanding,  ||s||  =  0  implies  that  s  is  zero,  which  is  a  property  that  is  true  for  norms  in 
finite-dimensional  vector  spaces. 


(2.27) 


Example  2.2.1  (Energy  computations)  Consider  s(t)  =  2/r0;ri  +.)/[t/2,2T1- 
more  detail,  we  have 


r  2,  0  <  t  <  T/2 

s(t)  —  <  2  +  j,  T/2<t<T 
{  j,  T  <t<2T 


Writing  it  out  in 


so  that  its  energy  is  given  by 


2  2dt  + 


2  +  j\2dt 


\j\2dt  =  4(T/2)  +  5(T/2)  +  T=  11T/2 
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As  another  example,  consider  s(t)  =  e  3ltl+.?2,rtj  for  which  the  energy  is  given  by 

p  oo  poo 

\e-3\t\+j2ntpdt  =  /  e-6\t\dt  =  2  e-6  tdf  =  ^3 


|s||2  = 


J  —oo  J —oo  JO 

Note  that  the  complex  phase  term  j 2nt  does  not  affect  the  energy,  since  it  goes  away  when  we 
take  the  magnitude. 


Power:  The  power  of  a  signal  s(t)  is  defined  as  the  time  average  of  its  energy  computed  over  a 
large  time  interval: 

1 

Ps  =  lirn  —  /  \s{t)\2dt  (2.28) 

To  — ^  OO  J  q  J _ Tp 

Finite  energy  signals,  of  course,  have  zero  power. 

We  see  from  (2.28)  that  power  is  defined  as  a  time  average.  It  is  useful  to  introduce  a  compact 
notation  for  time  averages. 

Time  average:  For  a  function  g(t),  define  the  time  average  as 


1  f  2 

9  =  lim  77T  /  9{t)dt 

T0->oo  ln  T0 


(2.29) 


That  is,  we  compute  the  time  average  over  an  observation  interval  of  length  Ta,  and  then  let 
the  observation  interval  get  large.  We  can  now  rewrite  the  power  computation  in  (2.28)  in  this 
notation  as  follows. 

Power:  The  power  of  a  signal  s(t)  is  defined  as 


Ps=\s(t)  |- 


(2.30) 


Another  time  average  of  interest  is  the  DC  value  of  a  signal. 

DC  value:  The  DC  value  of  s(t)  is  defined  as  s(t). 

Let  us  compute  these  quantities  for  the  simple  example  of  a  complex  exponential,  s(t)  = 
Aei(27Tf°t+9\  where  A  >  0  is  the  amplitude,  9  e  [0,  27t]  is  the  phase,  and  /o  is  a  real- valued 
frequency.  Since  |s(f)|2  =  A2  for  all  t,  we  get  the  same  value  when  we  average  it.  Thus,  the 
power  is  given  by  Ps  =  s2(t )  =  A2.  For  nonzero  frequency  /o,  it  is  intuitively  clear  that  all  the 
power  in  s  is  concentrated  away  from  DC,  since  s(t)  =  Ae^27r^°t+0^  -H-  S(f)  =  Ae^e8{f  —  /o). 
We  therefore  see  that  the  DC  value  is  zero.  While  this  is  a  convincing  intuitive  argument,  it  is 
instructive  to  prove  this  starting  from  the  definition  (2.29). 

Proving  that  a  complex  exponential  has  zero  DC  value:  For  s(t)  =  74e-?^27r^°f+6l\  the 
integral  over  its  period  (of  length  l//0)  is  zero.  As  shown  in  Figure  2.7,  the  length  L  of  any 
interval  /  can  be  written  as  L  —  K/ fo  + 1  where  K  is  a  nonnegative  integer  and  0  <  f  ^  is 
the  length  of  the  remaining  interval  Ir.  Since  the  integral  over  an  integer  number  of  periods  is 
zero,  we  have 

/  s(t)dt  =  /  s(t)dt 


Thus, 


s(t)dt\  =  |  f  s(t)dt\  <  £  maxt|s(f)|  =  Al  <  — 
J  ir  Jo 
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Interval  Ir  (length  / ) 


K/f, 


1/fr 


-  Interval  /- - «*■ 

Figure  2.7:  The  interval  /  for  computing  the  time  average  of  a  periodic  function  with  period 
l//o  can  be  decomposed  into  an  integer  number  K  of  periods,  with  the  remaining  interval  Ir  of 
length  (t  <  -k 


since  \s(t)  \  =  A.  We  therefore  obtain 


lo 

s(t)dt\  <  {A/fo) 


which  yields  that  the  DC  value  s  =  0,  since 


s 


lim 

T0 — yoo 


1 

T~ 

±  O 


s(t)dt |  <  lim 


A 


t0-kx>  f0To 


0 


Essentially  the  same  argument  implies  that,  in  general,  the  time  average  of  a  periodic  signal 
equals  the  average  over  a  single  period.  We  use  this  fact  without  further  comment  henceforth. 

Power  and  DC  value  of  a  sinusoid:  For  a  real-valued  sinusoid  s(t)  =  Acos(27t fQt  +  9),  we 
can  use  the  results  derived  for  complex  exponentials  above.  Using  Euler’s  identity,  a  real-valued 
sinusoid  at  /0  is  a  sum  of  complex  exponentials  at  ±/0: 


s(t) 


_pj(2nfot+0) 

2 


A 

+  2  6 


j  (2  irf0t+9) 


Since  each  complex  exponential  has  zero  DC  value,  we  obtain 


s  =  0 


That  is,  the  DC  value  of  any  real-valued  sinusoid  is  zero. 


_  _  A2  A  2  A2 

Ps  =  s2(t )  =  A2  cos2(2vr  f0t  +  9)  =  —  +  —  cos(4vr  f0t  +  29)  =  — 
since  the  DC  value  of  the  sinusoid  at  2/0  is  zero. 


2.3  Linear  Time  Invariant  Systems 

System:  A  system  takes  as  input  one  or  more  signals,  and  produces  as  output  one  or  more 
signals.  A  system  is  specified  once  we  characterize  its  input-output  relationship;  that  is,  if  we 
can  determine  the  output,  or  response,  y(t),  corresponding  to  any  possible  input  x(t)  in  a  given 
class  of  signals  of  interest. 
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Our  primary  focus  here  is  on  linear  time  invariant  (LTI)  systems,  which  provide  good  models 
for  filters  at  the  transmitter  and  receiver,  as  well  as  for  the  distortion  induced  by  a  variety  of 
channels.  We  shall  see  that  the  input-output  relationship  is  particularly  easy  to  characterize  for 
such  systems. 

Linear  system:  Let  Xi (t)  and  x2{t)  denote  arbitrary  input  signals,  and  let  yiit)  and  y2 (t) 
denote  the  corresponding  system  outputs,  respectively.  Then,  for  arbitrary  scalars  a±  and  a2,  the 
response  of  the  system  to  input  a\X\{t)  +  a2x2(t)  is  aiyi{t)  +  a2y2(t). 

Time  invariant  system:  Let  y(t)  denote  the  system  response  to  an  input  x(t).  Then  the 
system  response  to  a  time-shifted  version  of  the  input,  x\ (t)  =  x(t  —  t0)  is  y\{t)  =  y(t  —  t0).  That 
is,  a  time  shift  in  the  input  causes  an  identical  time  shift  in  the  output. 


Example  2.3.1  Examples  of  linear  systems  It  can  (and  should)  be  checked  that  the  following 
systems  are  linear.  These  examples  show  that  linear  systems  may  or  may  not  be  time  invariant. 

y{t)  =  2 x(t  —  1)  —  jx(t  —  2)  time  invariant 

y(t )  =  (3  —  2j)x(l  —  t )  time  varying 
y{t)  =  x{t)  cos(1007rt)  —  x[t  —  1)  sin(10()7rt)  time  varying 

nt-\-l 

y(t )  =  /  x(t)cIt  time  invariant 

Jt-i 

Example  2.3.2  Examples  of  time  invariant  systems  It  can  (and  should)  be  checked  that 
the  following  systems  are  time  invariant.  These  examples  show  that  time  invariant  systems  may 
or  may  not  be  linear. 

y(t)  =  nonlinear 

y{t)  =  /  x(r)e_^~T')dr  linear 

.7—00 

ft~\~  1 

y(t)  =  /  x2{t)(Lt  nonlinear 
Jt-i 

Linear  time  invariant  system:  A  linear  time  invariant  (LTI)  system  is  (unsurprisingly)  defined 
to  be  a  system  which  is  both  linear  and  time  invariant.  What  is  surprising,  however,  is  how 
powerful  the  LTI  property  is  in  terms  of  dictating  what  the  input-output  relationship  must  look 
like.  Specifically,  if  we  know  the  impulse  response  of  an  LTI  system  (i.e.,  the  output  signal 
when  the  input  signal  is  the  delta  function),  then  we  can  compute  the  system  response  for  any 
input  signal.  Before  deriving  and  stating  this  result,  we  illustrate  the  LTI  property  using  an 
example;  see  Figure  2.8.  Suppose  that  the  response  of  an  LTI  system  to  the  rectangular  pulse 
Pi(t)  =  /[_i  i](t)  is  given  by  the  trapezoidal  waveform  hi(t).  We  can  now  compute  the  system 

response  to  any  linear  combination  of  time  shifts  of  the  pulse  p(t),  as  illustrated  by  the  example 
in  the  figure.  More  generally,  using  the  LTI  property,  we  infer  that  the  response  to  an  input 
signal  of  the  form  x(t)  =  JT  a^t  -  U)  is  y(t)  =  JT  -  ti)- 

Can  we  extend  the  preceding  idea  to  compute  the  system  response  to  arbitrary  input  signals? 
The  answer  is  yes:  if  we  know  the  system  response  to  thinner  and  thinner  pulses,  then  we 
can  approximate  arbitrary  signals  better  and  better  using  linear  combinations  of  shifts  of  these 
pulses.  Consider  p/\{t)  =  |/r_A  A,(f),  where  A  >  0  is  getting  smaller  and  smaller.  Note  that  we 

have  normalized  the  area  of  the  pulse  to  unity,  so  that  the  limit  of  p&.(t)  as  A  — y  0  is  the  delta 
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Figure  2.8:  Given  that  the  response  of  an  LTI  system  S  to  the  pulse  pi(t)  is  hi(t),  we  can  use  the 
LTI  property  to  infer  that  the  response  to  x(t)  =  2 p±(t)  —  Pi(t  —  1)  is  y(t)  =  2 h\(t)  —  h\(t  —  1). 


Figure  2.9:  A  smooth  signal  can  be  approximated  as  a  linear  combination  of  shifts  of  tall  thin 
pulses. 
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function.  Figure  2.9  shows  how  to  approximate  a  smooth  input  signal  as  a  linear  combination  of 
shifts  of  pA^t).  That  is,  for  A  small,  we  have 


OO 

x(t)  ~  XA(t)  —  x(kA)ApA(t  —  kA)  (2-31) 

k=— oo 


If  the  system  response  to  pAif)  is  hA{t),  then  we  can  use  the  LTI  property  to  compute  the 
response  |/a(^)  to  xa  (t),  and  use  this  to  approximate  the  response  y{t )  to  the  input  x(t),  as 
follows: 

OO 

y(t)  ~  i/A(t)  =  x{kA)AliA{t  —  kA )  (2.32) 

k=— oo 


As  A  — >  0,  the  sums  above  tend  to  integrals,  and  the  pulse  Pa(^)  tends  to  the  delta  function  5(t). 
The  approximation  to  the  input  signal  in  equation  (2.31)  becomes  exact,  with  the  sum  tending 
to  an  integral: 

/*oo 


lim  xaU)  =  x(t ) 
A->0 


x{r)5(t 


t)cIt 


replacing  the  discrete  time  shifts  kA  by  the  continuous  variable  r,  the  discrete  increment  A  by 
the  infinitesimal  dr,  and  the  sum  by  an  integral.  This  is  just  a  restatement  of  the  sifting  property 
of  the  impulse.  That  is,  an  arbitrary  input  signal  can  be  expressed  as  a  linear  combination  of 
time-shifted  versions  of  the  delta  function,  where  we  now  consider  a  continuum  of  time  shifts. 

In  similar  fashion,  the  approximation  to  the  output  signal  in  (2.32)  becomes  exact,  with  the  sum 
reducing  to  the  following  convolution  integral: 


lim  yA(t)  =  y(t ) 

A— >0 


x(r)h(t 


r)dr 


(2.33) 


where  h(t)  denotes  the  impulse  response  of  the  LTI  system. 

Convolution  and  its  computation:  The  convolution  v(t)  of  two  signals  U\(t)  and  U2(t)  is 
given  by 

/oo  roo 

U\{r)u2{t  —  t)  dr  =  /  Ui(t  —  t)u2(t)  dr  (2.34) 

-oo  J  —  OO 

Note  that  r  is  a  dummy  variable  that  is  integrated  out  in  order  to  determine  the  value  of  the 
signal  v(t)  at  each  possible  time  t.  The  role  of  u\  and  u2  in  the  integral  can  be  exchanged.  This 
can  be  proved  using  a  change  of  variables,  replacing  t  —  r  by  r.  We  often  drop  the  time  variable, 
and  write  v  =  u\  *  u2  =  u2  *  u\. 

An  LTI  system  is  completely  characterized  by  its  impulse  response:  As  derived  in 
(2.33),  the  output  y  of  an  LTI  system  is  the  convolution  of  the  input  signal  u  and  the  system 
impulse  response  h.  That  is ,  y  =  u*h.  From  (2.34),  we  realize  that  the  role  of  the  signal  and  the 
system  can  be  exchanged:  that  is,  we  would  get  the  same  output  y  if  a  signal  h  is  sent  through 
a  system  with  impulse  response  u. 

Flip  and  slide:  Consider  the  expression  for  the  convolution  in  (2.34): 


v(t) 


ui (r)u2(t 


t )  dr 


Fix  a  value  of  time  t  at  which  we  wish  to  evaluate  v.  In  order  to  compute  v(t),  we  must  multiply 
two  functions  of  a  “dummy  variable”  r  and  then  integrate  over  r.  In  particular,  s2(t)  =  u2(—r ) 
is  the  signal  u2(t)  flipped  around  the  origin,  so  that  u2{t  —  r)  =  u2(— (r  —  t))  =  s2(r  —  t )  is 
s2{t)  translated  to  the  right  by  t  (if  t  <  0,  translation  to  the  right  by  t  actually  corresponds  to 
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translation  is  to  the  left  by  \t\).  In  short,  the  mechanics  of  computing  the  convolution  involves 
flipping  and  sliding  one  of  the  signals,  multiplying  with  the  other  signal,  and  integrating.  Pictures 
are  extremely  helpful  when  doing  such  computations  by  hand,  as  illustrated  by  the  following 
example. 


Uj(T  ) 


u2(M 

) 

t-3  t-1 

U2(M  ) 


t-3  t-1 


u2(M 

t- 

-3 

-1 

u2(M 

-3 

-1 

x 


x 


x 


x 


u2(M 


X 


t-3  t-1 


u,(t) 


_ 

1  3 


(a)  t-1  <  5 


(b)  t-3  <  5,  t-1  >  5 


(c)  t-3  >  5,  t-1  <  11 


Flip  u2H  ) 


-3  -1 

(d)  t-3  <  11,  t-1  >  11 


(e)  t-3  >11 


Figure  2.10:  Illustrating  the  flip  and  slide  operation  for  the  convolution  of  two  rectangular  pulses. 


v(t) 

-2 


t 


Figure  2.11:  The  convolution  of  the  two  rectangular  pulses  in  Example  2.3.3  results  in  a  trape¬ 
zoidal  pulse. 


Example  2.3.3  Convolving  rectangular  pulses:  Consider  the  rectangular  pulses  Ui(t)  = 
-^[5,11] (t)  and  u2{t )  =  J[ii3](f).  We  wish  to  compute  the  convolution 


v(t)  =  (ui  *  u2)(t) 


ui(r)u2(t 


r)dr 


We  now  draw  pictures  of  the  signals  involved  in  these  “flip  and  slide”  computations  in  order  to 
figure  out  the  limits  of  integration  for  different  ranges  of  t.  Figure  2.10  shows  that  there  are  five 
different  ranges  of  interest,  and  yields  the  following  result: 

(a)  For  t  <  6,  ui[r)u2(t  —  r)  =  0,  so  that  v(t)  =  0. 

(b)  For  6  <  t  <  8,  u\{r)u2{t  —  r)  =  1  for  5  <  r  <  t  —  1,  so  that 

v(t)  =  J  dr  =  t  —  6 
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(c)  For  8  <  t  <  12,  ui(r)u2(t  —  r)  =  1  for  t  —  3  <  r  <  t  —  1,  so  that 

v(t)  —  f  dr  =  2 

Jt- 3 

(d)  For  12  <  t  <  14,  ui{r)u2(t  —  r)  =  1  for  t  —  3  <  r  <  11,  so  that 

v(t)  =  f  dr  =  11  —  (/  —  3)  =  14  —  t 

Jt- 3 

(e)  For  t  >  14,  ui(r)u2(t  —  r)  =  0,  so  that  w(t)  =  0. 

The  result  of  the  convolution  is  the  trapezoidal  pulse  sketched  in  Figure  2.11. 


-a/2 


a/2 


* 


-b/2 


b/2 


-a/2 


a/2 


* 


-a/2 


a/2 


Figure  2.12:  Convolution  of  two  rectangular  pulses  as  a  function  of  pulse  durations.  The  trape¬ 
zoidal  pulse  reduces  to  a  triangular  pulse  for  equal  pulse  durations. 


It  is  useful  to  record  the  general  form  of  the  convolution  between  two  rectangular  pulses  of  the 
form  /[-a/2, a/2] (/)  and  I[-b/2,b/2\(J)i  where  we  take  a  <  b  without  loss  of  generality.  The  result  is 
a  trapezoidal  pulse,  which  reduces  to  a  triangular  pulse  for  a  =  b,  as  shown  in  Figure  2.12.  Once 
we  know  this,  using  the  LTI  property,  we  can  infer  the  convolution  of  any  signals  which  can  be 
expressed  as  a  linear  combination  of  shifts  of  rectangular  pulses. 

Occasional  notational  sloppiness  can  be  useful:  As  the  preceding  example  shows,  a  con¬ 
volution  computation  as  in  (2.34)  requires  a  careful  distinction  between  the  variable  t  at  which 
the  convolution  is  being  evaluated,  and  the  dummy  variable  r.  This  is  why  we  make  sure  that 
the  dummy  variable  does  not  appear  in  our  notation  (5  *  r)(t)  for  the  convolution  between  sig¬ 
nals  s(t)  and  r(t).  However,  it  is  sometimes  convenient  to  abuse  notation  and  use  the  notation 
s(t)  *  r(t)  instead,  as  long  we  remain  aware  of  what  we  are  doing.  For  example,  this  enables  us 
to  compactly  state  the  following  linear  time  invariance  (LTI)  property: 

(aiSi(t  -  fi)  +  a2s2(t  -  t2))  *  r(t)  =  ai(si  *r)(t-  ti)  +  a2(s2  *r)(t  —  t2) 

for  any  complex  gains  Gq  and  a2,  and  any  time  offsets  H  and  t2. 

Example  2.3.4  (Modeling  a  multipath  channel)  We  can  get  a  delayed  version  of  a  signal 

by  convolving  it  with  a  delayed  impulse  as  follows: 

yi{t)  =  u(t )  *  5(t  —  t±)  =  u(t  —  ti )  (2.35) 


41 


To  see  this,  compute 


yi (t)  =  j  u(r)S(t  —  t  —  ti)dr  —  j  u(t)8(t  —  (t  —  t\))dr  —  u{t  —  ti) 
where  we  first  use  the  fact  that  the  delta  function  is  even,  and  then  use  its  sifting  property. 


Figure  2.13:  Multipath  channels  typical  of  wireless  communication  can  include  line  of  sight  (LOS) 
and  reflected  paths. 


Equation  (2.35)  immediately  tells  us  how  to  model  multipath  channels,  in  which  multiple  scat¬ 
tered  versions  of  a  transmitted  signal  u(t)  combine  to  give  a  received  signal  y{t)  which  is  a 
superposition  of  delayed  versions  of  the  transmitted  signal,  as  illustrated  in  Figure  2.13: 

y{t)  =  aiu(t  -  ti)  +  ...  +  amu(t  -  Tm) 

(plus  noise,  which  we  have  not  talked  about  yet).  From  (2.35),  we  see  that  we  can  write 

y(t)  =  aq u(t)  *  5(t  -  Ti)  +  ...  +  amu(t)  *  S(t  -  rm)  =  u(t)  *  (aq<5(£  -  n)  +  ...  +  a m8{t  -  rm)) 

That  is,  we  can  model  the  received  signal  as  a  convolution  of  the  transmitted  signal  with  a 
channel  impulse  response  which  is  a  linear  combination  of  time-shifted  impulses: 

h(t)  =  ai5(t  -  n)  +  ...  +  am8(t  -  rm)  (2.36) 

Figure  2.14  illustrates  how  a  rectangular  pulse  spreads  as  it  goes  through  a  multipath  channel 
with  impulse  response  h(t)  =  S(t  —  1)  —  0.5<5(t  —  1.5)  +  0.5 S(t  —  3.5).  While  the  gains  {aq}  in  this 
example  are  real- valued,  as  we  shall  soon  see  (in  Section  2.8),  we  need  to  allow  both  the  signal 
u(t)  and  the  gains  {«&}  to  take  complex  values  in  order  to  model,  for  example,  signals  carrying 
information  over  radio  channels. 


u(t) 
1  - 


Rectangular  pulse 


h(t) 

I 

1 

Multipath  Channel 

0.5 

1-5  t  f 

1 

0.5 

- 3»- 

y(t)  =  (u*h)  (t) 

-0.5 

1  3  3.5  5.5  '  1 

1  3.5 

-0.5 

Channel  output 

Figure  2.14:  A  rectangular  pulse  through  a  multipath  channel. 
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LTI  System 


Figure  2.15:  Complex  exponentials  are  eigenfunctions  of  LTI  systems. 


Complex  exponential  through  an  LTI  system:  In  order  to  understand  LTI  systems  in  the 
frequency  domain,  let  us  consider  what  happens  to  a  complex  exponential  u(t)  =  e^2n^ot  when  it 
goes  through  an  LTI  system  with  impulse  response  h(t).  The  output  is  given  by 


y(t)  —  {u*  h)(t )  =  J ^  h(r)e-?27r*^  ^ dr 
=  ej2nfot  h{T)e~j2nfoTdr  =  H(f0)ej2nfot 


(2.37) 


where 

/OC 

h(r)e~j2^dr 

-OO 

is  the  Fourier  transform  of  h  evaluated  at  the  frequency  /0.  We  discuss  the  Fourier  transform 
and  its  properties  in  more  detail  shortly. 

Complex  exponentials  are  eigenfunctions  of  LTI  systems:  Recall  that  an  eigenvector  of 
a  matrix  H  is  any  vector  x  that  satisfies  Hx  =  Ax.  That  is,  the  matrix  leaves  its  eigenvectors 
unchanged  except  for  a  scale  factor  A,  which  is  the  eigenvalue  associated  with  that  eigenvector. 
In  an  entirely  analogous  fashion,  we  see  that  the  complex  exponential  signal  ej27r^ot  is  an  eigen¬ 
function  of  the  LTI  system  with  impulse  response  h,  with  eigenvalue  H(fo).  Since  we  have  not 
constrained  h,  we  conclude  that  complex  exponentials  are  eigenfunctions  of  any  LTI  system.  We 
shall  soon  see,  when  we  discuss  Fourier  transforms,  that  this  eigenfunction  property  allows  us 
to  characterize  LTI  systems  in  the  frequency  domain,  which  in  turn  enables  powerful  frequency 
domain  design  and  analysis  tools. 


2.3.1  Discrete  time  convolution 

DSP-based  implementations  of  convolutions  are  inherent  discrete  time.  For  two  discrete  time 
sequences  {?/i[n]}  and  {u2[n\},  their  convolution  y  =  U\  *  u2  is  defined  analogous  to  continuous 
time  convolution,  replacing  integration  by  summation: 

y[n]  =  £  u\[k]u2[n  —  k]  (2.38) 

k 

Matlab  implements  this  using  the  “conv”  function.  This  can  be  interpreted  as  U\  being  the  input 
to  a  system  with  impulse  response  u2,  where  a  discrete  time  impulse  is  simply  a  one,  followed  by 
all  zeros. 

Continuous  time  convolution  between  U\(t)  and  u2(t)  can  be  approximated  using  discrete  time 
convolutions  between  the  corresponding  sampled  signals.  For  example,  for  samples  at  rate  1/TS, 
the  infinitesimal  dt  is  replaced  by  the  sampling  interval  Ts  as  follows: 

y(t)  =  {ui  *  u2){t)  =  I  Ui{r)u2{t  -  r)dr  «  ^  ui{kTs)u2(t  -  kTs)Ts 

k 

Evaluating  at  a  sampling  time  t  =  nTs ,  we  have 

y{nTs )  =  Ts  ui(kTs)u2(nTs  -  kTs) 

k 
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Letting  x[n\  =  x(nTs )  denote  the  discrete  time  waveform  corresponding  to  the  nth  sample  for 
each  of  the  preceding  waveforms,  we  have 

y(nTs )  =  y[n]  ss  Ts  ^  ui[k]u2[n  -  k]  =  Ts{ux  *u2)[n]  (2.39) 

k 

which  shows  us  how  to  implement  continuous  time  convolution  using  discrete  time  operations. 


Figure  2.16:  Two  signals  and  their  continuous  time  convolution,  computed  in  discrete  time  using 
Code  Fragment  2.3.1. 


The  following  Matlab  code  provides  an  example  of  a  continuous  time  convolution  approximated 
numerically  using  discrete  time  convolution,  and  then  plotted  against  the  original  continuous 
time  index  t,  as  shown  in  Figure  2.16  (cosmetic  touches  not  included  in  the  code  below).  The 
two  waveforms  convolved  are  u\(t)  =  f2/[_iii](f)  and  u2(t)  =  e_^+1^/[_ ij00)  (the  latter  is  truncated 
in  our  discrete  time  implementation). 

Code  Fragment  2.3.1  (Discrete  time  computation  of  continuous  time  convolution) 

dt=0.01;  /(sampling  interval  T_s 
°/„%FIRST  SIGNAL 

ulstart=-l;  ulend  =  1;  %start  and  end  times  for  first  signal 

tl=ulstart :dt :ulend;  %sampling  times  for  first  signal 

ul=tl.~2;  %discrete  time  version  of  first  signal 

°/„%SEC0ND  SIGNAL  (exponential  truncated  when  it  gets  small) 

u2start=-l;  u2end  =  10; 

t2=u2start : dt : u2end ; 

u2=exp(-(t2+l)) ; 

/^APPROXIMATION  OF  CONTINUOUS  TIME  CONVOLUTION 
y=dt*conv(ul ,u2) ; 

%%PL0T  OF  SIGNALS  AND  THEIR  CONVOLUTION 

ystart=ulstart+u2start ;  %start  time  for  convolution  output 
time_axis  =  ystart :dt :ystart+dt*(length(y)-l) ; 
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“/.“/PLOT  ul,  u2  and  y 
plot(tl,ul, ’r-. ’) ; 
hold  on; 

plot(t2,u2, ’r: ’) ; 
plot (time_axis ,y) ; 

legend (’ul’ ,’u2’ ,’y’ , ’ Location ’ , ’NorthEast’) ; 
hold  off ; 


2.3.2  Multi-rate  systems 

While  continuous  time  signals  can  be  converted  to  discrete  time  by  sampling  “fast  enough,”  it  is 
often  required  that  we  operate  at  multiple  sampling  rates.  For  example,  in  digital  communication, 
we  may  send  a  string  of  symbols  {6[n]}  (think  of  these  as  taking  values  +1  or  -1  for  now)  by 
modulating  them  onto  shifted  versions  of  a  pulse  p(t)  as  follows: 

u(t)  =  y  b[n\p(t  —  nT )  (2.40) 

n 

where  1/T  is  the  rate  at  which  symbols  are  generated  (termed  the  symbol  rate).  In  order  to 
represent  the  analog  pulse  p(t)  as  discrete  time  samples,  we  may  sample  it  at  rate  1/TS,  typically 
chosen  to  be  an  integer  multiple  of  the  symbol  rate,  so  that  T  =  mTs,  where  m  is  a  positive 
integer.  Typical  values  employed  in  transmitter  DSP  modules  might  be  m  =  4  or  m  =  8.  Thus, 
the  system  we  are  interested  is  multi-rate:  waveforms  are  sampled  at  rate  1/TS  =  m/T,  but 
the  input  is  at  rate  1/T.  Set  u[k\  =  u(kTs )  and  p[k\  =  p(kTs )  as  the  discrete  time  signals 
corresponding  to  samples  of  the  transmitted  waveform  u{t)  and  the  pulse  p(t),  respectively.  We 
can  write  the  sampled  version  of  (2.40)  as 

u[k]  =  y  b[n\p{kTs  —  nT)  =  ^^b[n]p[k  —  nm]  (2-41) 

n  n 

The  preceding  almost  has  the  form  of  a  discrete  time  convolution,  but  the  key  difference  is 
that  the  successive  symbols  {6[n]}  are  spaced  by  time  T,  which  corresponds  to  m  >  1  samples 
at  the  sampling  rate  1/TS.  Thus,  in  order  to  implement  this  system  using  convolution  at  rate 
1/T,,  we  must  space  out  the  input  symbols  by  inserting  m  —  1  zeros  between  successive  symbols 
b[n],  thus  converting  a  rate  1/T  signal  to  a  rate  1/TS  =  m/T  signal.  This  process  is  termed 
upsampling.  While  the  upsampling  function  is  available  in  certain  Matlab  toolboxes,  we  provide 
a  self-contained  code  fragment  below  that  illustrates  its  use  for  digital  modulation,  and  plots 
the  waveform  obtained  for  symbol  sequence  —  1, +1, +1,  —  1.  The  modulating  pulse  is  a  sine 
pulse:  p(t)  =  sin(nt/T)L0tT],  and  our  convention  is  to  set  T  =  1  without  loss  of  generality 
(or,  equivalently,  to  replace  t  by  t/T).  We  set  the  oversampling  factor  M  =  16  in  order  to 
obtain  smooth  plots,  even  though  typical  implementations  in  communication  transmitters  may 
use  smaller  values. 

Code  Fragment  2.3.2  (Upsampling  for  digital  modulation) 

m=16;  “/.sampling  rate  as  multiple  of  symbol  rate 

“/discrete  time  representation  of  sine  pulse 

time_p  =  0:l/m:l;  “/sampling  times  over  duration  of  pulse 

p  =  sin(pi*time_p)  ;  “/samples  of  the  pulse 

“/symbols  to  be  modulated 

symbols  =  [— 1 ; 1 ; 1 ; — 1] ; 
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Figure  2.17:  Digitally  modulated  waveform  obtained  using  Code  Fragment  2.3.2. 


"/.UPSAMPLE  BY  m 

nsyrabols  =  length  (symbols)  ;  "/.length  of  original  symbol  sequence 
nsymbols_upsampled  =  l+(nsymbols-l) *m; "/.length  of  upsampled  symbol  sequence 
symbols_upsampled  =  zeros (nsymbols_upsampled,  1)  ;"/. 

symbols_upsampled(l  :m:nsymbols_upsampled)=symbols; "/.insert  symbols  with  spacing  M 
"/.GENERATE  MODULATED  SIGNAL  BY  DISCRETE  TIME  CONVOLUTION 
u=conv(symbols_upsampled,p) ; 

"/.PLOT  MODULATED  SIGNAL 

time_u  =  0 : 1/m:  (length (u)-l)/m;  "/.unit  of  time  =  symbol  time  T 
plot (time_u,u) ; 
xlabel( ’t/T’ ) ; 


2.4  Fourier  Series 

Fourier  series  represent  periodic  signals  in  terms  of  sinusoids  or  complex  exponentials.  A  signal 
u(t)  is  periodic  with  period  T  if  u(t  +  T)  =  u{t)  for  all  t.  Note  that,  if  u  is  periodic  with  period 
T ,  then  it  is  also  periodic  with  period  nT ,  where  n  is  any  positive  integer.  The  smallest  time 
interval  for  which  u{t)  is  periodic  is  termed  the  fundamental  period.  Let  us  denote  this  by  T0,  and 
define  the  corresponding  fundamental  frequency  /o  =  1/To  (measured  in  Hertz  if  To  is  measured 
in  seconds).  It  is  easy  to  show  that  if  u(t)  is  periodic  with  period  T,  then  T  must  be  an  integer 
multiple  of  To.  In  the  following,  we  often  simply  refer  to  the  fundamental  period  as  “period.” 

Using  mathematical  machinery  beyond  our  current  scope,  it  can  be  shown  that  any  periodic  signal 
with  period  To  (subject  to  mild  technical  conditions)  can  be  expressed  as  a  linear  combination 
of  complex  exponentials 


=  ej2nmfot  =  ej2nmt/To 


,  m  —  0,  ±1,  ±2, ... 


whose  frequencies  are  integer  multiples  of  the  fundamental  frequency  /q.  That  is,  we  can  write 


(2.42) 


n=—oo 


n=—oo 
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The  coefficients  {un}  are  in  general  complex- valued,  and  are  called  the  Fourier  series  for  u(t). 
They  can  be  computed  as  follows: 


Uk 


Tr 


0  JTo 


u(t)e~j2*kfotdt 


(2.43) 


where  fTg  denotes  an  integral  over  any  interval  of  length  T0. 


Let  us  now  derive  (2.43).  For  m  a  nonzero  integer,  consider  an  arbitrary  interval  of  length  To,  of 
the  form  [D,  D  +  Tq],  where  the  offset  D  is  free  to  take  on  any  real  value.  Then,  for  any  nonzero 
integer  m  ^  0,  we  have 


JD+To  ej2nmfotdt 


•0  j.  ,  D+Tq 

ej2ir  mfQt  u 

j2irmf0  D 


(2.44) 


ej27r/omD_ej(27rm/o.D+27rm) 

j2nmfo 


0 


since  ej27rm  =  1.  Thus,  when  we  multiply  both  sides  of  (2.42)  by  e_j27rfc^ot  and  integrate  over  a 
period,  all  terms  corresponding  to  n  ^  k  drop  out  by  virtue  of  (2.44),  and  we  are  left  only  with 
the  n  =  k  term: 


j£+T°  u(t)e~j2nkfotdt  =  j°+To  [En=-ocunej27Tnf0t]  e~j2nkfotdt 


=  uk  J°+To  ej^fote-j2nkf0tdt  +  un  /^+To  ei^-Wo *dt  =  ukT0  +  0 
which  proves  (2.43). 

We  denote  the  Fourier  series  relationship  (2.42)-(2.43)  as  u{t)  -h-  {un}.  It  is  useful  to  keep 
in  mind  the  geometric  meaning  of  this  relationship.  The  space  of  periodic  signals  with  period 
T0  =  d-  can  be  thought  of  in  the  same  way  as  the  finite- dimensional  vector  spaces  we  are  familiar 
with,  except  that  the  inner  product  between  two  periodic  signals  is  given  by 


(u,  v)t0  —  /  u(t)v*(t)dt 
Jt0 

The  energy  over  a  period  for  a  signal  u  is  given  by  1 1 n 1 1 f Q  =  (u,u)t0,  where  ||n||r0  denotes  the 
norm  computed  over  a  period.  We  have  assumed  that  the  Fourier  basis  spans  this  vector 

space,  and  have  computed  the  Fourier  series  after  showing  that  the  basis  is  orthogonal: 

(V’mV’rrO'Zb  =  0  ,  Tl  ^  771 


and  equal  energy: 

ll^nlllb  =  (V,n;'0n)To  =  Tq 

The  computation  of  the  expression  for  the  Fourier  series  {uk}  can  be  rewritten  in  these  vector 
space  terms  as  follows.  A  periodic  signal  u(t)  can  be  expanded  in  terms  of  the  Fourier  basis  as 

OO 

U{t)  =  ^  Un^nit)  (2-45) 

n=—oo 


Using  the  orthogonality  of  the  basis  functions,  we  have 

{u,l/>k)T, b  =  '^Un^'lp^To  =  UkU^kW2 

n 
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That  is, 


(2.46) 


_  {uAk)T0  _  ('»,'0fc)ro 

Uk~  I  |^fc|  I2  “  To 

In  general,  the  Fourier  series  of  an  arbitrary  periodic  signal  may  have  an  infinite  number  of  terms. 
In  practice,  one  might  truncate  the  Fourier  series  at  a  finite  number  of  terms,  with  the  number 
of  terms  required  to  provide  a  good  approximation  to  the  signal  depending  on  the  nature  of  the 
signal. 


Figure  2.18:  Square  wave  with  period  T0. 


Example  2.4.1  Fourier  series  of  a  square  wave:  Consider  the  periodic  waveform  u(t )  as 
shown  in  Figure  2.18.  For  k  —  0,  we  get  the  DC  value  u0  =  Amax+Ami^ ,  For  k  ^  0,  we  have, 
using  (2.43),  that 


uk  =  jr  /°3,  Amine  i2nkt/Todt  +  A  Jq2  Amaxe  32-*kt/T°dt 


=  Amin  a-r^kt/T0 
To  —j2-Kk/T0 


+ 


Amax  e~ 

T0  —j2-rrk/T0 


To 

2 

0 


Amin  (l-e^fc)  +AmaX  (e~^fc-l) 
—j2nk 


For  k  even,  e^nk  =  e  ■?7rA  =  1,  which  yields  Uk  =  0.  That  is,  there  are  no  even  harmonics.  For  k 
odd,  eJ7rk  =  e_J7rfc  =  —  1,  which  yields  Uk  =  Arnax~Amin .  We  therefore  obtain 


uk  = 


°, 

A-max  A-min 

j?rk 


k  even 
k  odd 


Combining  the  terms  for  positive  and  negative  fc,  we  obtain 


u(t )  = 


A-max  A-min 


£ 

k  odd 


2(Ar 


Ar 


7ik 


■  sin27r/cf/To 


Example  2.4.2  Fourier  series  of  an  impulse  train:  Even  though  the  delta  function  is  not 
physically  realizable,  the  Fourier  series  of  an  impulse  train,  as  shown  in  Figure  2.19  turns  out  to 
be  extremely  useful  in  theoretical  development  and  in  computations.  Specifically,  consider 

OO 

u(t )  =  5{t  —  nT0 ) 

n= — oo 
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xo  u  xo 

Figure  2.19:  An  impulse  train  of  period  Tq. 


By  integrating  over  an  interval  of  length  T0  centered  around  the  origin,  we  obtain 
uk  =  —  /  2  u{t)e~j2nkhtdt  =  —  /  2  5{t)e~j2nkfotdt  =  — 

To  J 1  J  T0  J_ a  1  '  T0 

using  the  sifting  property  of  the  impulse.  That  is,  the  delta  function  has  equal  frequency  content 
at  all  harmonics.  This  is  yet  another  manifestation  of  the  physical  unrealizability  of  the  impulse: 
for  well-behaved  signals,  the  Fourier  series  should  decay  as  the  frequency  increases. 


While  we  have  considered  signals  which  are  periodic  functions  of  time,  the  concept  of  Fourier 
series  applies  to  periodic  functions  in  general,  whatever  the  physical  interpretation  of  the  argu¬ 
ment  of  the  function.  In  particular,  as  we  shall  see  when  we  discuss  the  effect  of  time  domain 
sampling  in  the  context  of  digital  communication,  the  time  domain  samples  of  a  waveform  can 
be  interpreted  as  the  Fourier  series  for  a  particular  periodic  function  of  frequency. 


2.4.1  Fourier  Series  Properties  and  Applications 

We  now  state  some  Fourier  series  properties  which  are  helpful  both  for  computation  and  for 
developing  intuition.  The  derivations  are  omitted,  since  they  follow  in  a  straightforward  manner 
from  (2.42)-(2.43),  and  are  included  in  any  standard  text  on  signals  and  systems.  In  the  following, 
u(t),  v(t)  denote  periodic  waveforms  of  period  T0  and  Fourier  series  {uk},  {vk}  respectively. 

Linearity:  For  arbitrary  complex  numbers  a,  /?, 

au(t)  +  (3v(t)  -H-  {auk  +  /3vk} 

Time  delay  corresponds  to  linear  phase  in  frequency  domain: 

u(t-d)++  {uke~j2nkfod  =  uke-j2M/To} 

The  Fourier  series  of  a  real-valued  signal  is  conjugate  symmetric:  If  u(t)  is  real-valued, 

then  uk  =  u*_k. 

Harmonic  structure  of  real-valued  periodic  signals:  While  both  the  Fourier  series  coef¬ 
ficients  and  the  complex  exponential  basis  functions  are  complex- valued,  for  real- valued  u(t), 
the  linear  combination  on  the  right-hand  side  of  (2.42)  must  be  real- valued.  In  particular,  as 
we  show  below,  the  terms  corresponding  to  uk  and  u_k  (k  >  1)  combine  together  into  a  real- 
valued  sinusoid  which  we  term  the  fcth  harmonic.  Specifically,  writing  uk  =  Ake^k  in  polar 
form,  we  invoke  the  conjugate  symmetry  of  the  Fourier  series  for  real- valued  u(t)  to  infer  that 
=  u*k  =  Ake~^k .  The  Fourier  series  can  therefore  be  written  as 

oo  oo 

u(t)  =  u0  +  J2ukej2nkfot  +  u.ke~j2nkfot  =  u0  +  J2  {Akej^ej2nkfot  +  Ake-j^e~j2nkfot) 
k= 1  k= 1 
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This  yields  the  following  Fourier  series  in  terms  of  real-valued  sinusoids: 

OO  OO 

u{t)  =  u0  +  ^  2 Ak  cos(27r/c/0t  +  (pk)  =  u0  +  ^  2\uk\  cos  (2-it kf0t  +  /u^)  (2.47) 

k— 1  k= 1 


Differentiation  amplifies  higher  frequencies: 

x(t)  =  ^h(t)  O  xk  =  j2nkf0uk  (2.48) 

Note  that  differentiation  kills  the  DC  term,  i,.e,  Xq  =  0.  However,  the  information  at  all  other 
frequencies  is  preserved.  That  is,  if  we  know  {xk}  then  we  can  recover  {uk,  k  ^  0}  as  follows: 

Uk  =  TTI>  k  *  0  (2-49) 

j2rr  f0k 

This  is  a  useful  property,  since  differentiation  often  makes  Fourier  series  easier  to  compute. 


Lmax 


Lmin 


d/dt 


A  _A  • 
^  max  ^  min 


V 


•  •  • 

T0/2 

•  •  • 

•  •  • 

0 

T 

Ao 

•  •  • 

(A  max  Amin) 

Figure  2.20:  The  derivative  of  a  square  wave  is  two  interleaved  impulse  trains. 


Example  2.4.1  redone  (using  differentiation  to  simplify  Fourier  series  computation): 

Differentiating  the  square  wave  in  Figure  2.18  gives  us  two  interleaved  impulse  trains,  one  cor¬ 
responding  to  the  upward  edges  of  the  rectangular  pulses,  and  the  other  to  the  downward  edges 
of  the  rectangular  pulses,  as  shown  in  Figure  2.20. 


x(t)  (jj_  u(t)  (Amax  Ar 


^2  W  -  kT0)  -  22  S(t  -  kT0  -  T0/2) 


Compared  to  the  impulse  train  in  Example  2.4.2,  the  first  impulse  train  above  is  offset  by  0, 
while  the  second  is  offset  by  To/2  (and  inverted).  We  can  therefore  infer  their  Fourier  series 
using  the  time  delay  property,  and  add  them  up  by  linearity,  to  obtain 


Ar 


~Ar 


Ar, 


-A, 


Xk  = 


mm  e-j27Tf0kT0/2  _ 


Ar, 


~Ar, 


(1 


,k  7^  0 
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Using  the  differentiation  property,  we  can  therefore  infer  that 


Uk  jZnfok 


-A-max  A-min 

-j27rkf0To 


(1  -  e~jirk ) 


which  gives  us  the  same  result  as  before.  Note  that  the  DC  term  uq  cannot  be  obtained  using 
this  approach,  since  it  vanishes  upon  differentiation.  But  it  is  easy  to  compute,  since  it  is  just 
the  average  value  of  u(t),  which  can  be  seen  to  be  uo  =  ( Amax  +  Amin)/2  by  inspection. 

In  addition  to  simplifying  computation  for  waveforms  which  can  be  described  (or  approximated) 
as  polynomial  functions  of  time  (so  that  enough  differentiation  ultimately  reduces  them  to  im¬ 
pulse  trains),  the  differentiation  method  explicitly  reveals  how  the  harmonic  structure  (i.e. ,  the 
strength  and  location  of  the  harmonics)  of  a  periodic  waveform  is  related  to  its  transitions  in 
the  time  domain.  Once  we  understand  the  harmonic  structure,  we  can  shape  it  by  appropriate 
filtering.  For  example,  if  we  wish  to  generate  a  sinusoid  of  frequency  300  MHz  using  a  digital 
circuit  capable  of  generating  symmetric  square  waves  of  frequency  100  MHz,  we  can  choose  a 
filter  to  isolate  the  third  harmonic.  However,  we  cannot  generate  a  sinusoid  of  frequency  200 
MHz  (unless  we  make  the  square  wave  suitably  asymmetric),  since  the  even  harmonics  do  not 
exist  for  a  symmetric  square  wave  (i.e.,  a  square  wave  whose  high  and  low  durations  are  the 
same) . 

Parseval’s  identity  (periodic  inner  product/power  can  be  computed  in  either  time 
or  frequency  domain):  Using  the  orthogonality  of  complex  exponentials  over  a  period,  it  can 
be  shown  that 

r.  OO 

(u,v)To=  /  u(t)v*(t)dt  =  T0  ^2  ukV*k  (2.50) 

^T°  k=- OO 

Setting  v  =  u,  and  dividing  both  sides  by  T0,  the  preceding  specializes  to  an  expression  for  signal 
power  (which  can  be  computed  for  a  periodic  signal  by  averaging  over  a  period): 


f/ 

to  ./To 


u(t)\2dt 


M 2 


(2.51) 


2.5  Fourier  Transform 

We  define  the  Fourier  transform  U(f)  for  a  aperiodic,  finite  energy  waveform  u(t)  as 

/OO 

u(t)e~:’2n^tdt  ,  —  oo  <  /  <  oo  Fourier  Transform  (2.52) 

-OO 

The  inverse  Fourier  transform  is  given  by 

/OO 

U(f)e^ftdf  ,  —  oo  <  t  <  oo  Inverse  Fourier  Transform  (2.53) 

-OO 


The  inverse  Fourier  transform  tells  us  that  any  finite  energy  signal  can  be  written  as  a  linear  com¬ 
bination  of  a  continuum  of  complex  exponentials,  with  the  coefficients  of  the  linear  combination 
given  by  the  Fourier  transform  U(f). 

Notation:  We  call  a  signal  and  its  Fourier  transform  a  Fourier  transform  pair,  and  denote  them 
as  u(t)  U(f).  We  also  denote  the  Fourier  transform  operation  by  J7,  so  that  U(f)  =  T{u{t)). 
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Example  2.5.1  Rectangular  pulse  and  sine  function  form  a  Fourier  transform  pair: 

Consider  the  rectangular  pulse  u(t)  =  I[-T/2,T/2]{t)  of  duration  T.  Its  Fourier  transform  is  given 
by 

U(f)  —  f^0u(t)e~j2irftdt  =  /5/2  e~j2nftdt 


e-j2nft  T/ 2  _  e-j*}T  _ejnfT 

-j2irf  —T/2  —  -j2irf 


=  E&m  =  Tsinc(/T) 

We  denote  this  as 

/[-t/2, t/2]  (t)  Tsinc(/T) 


Duality:  Given  the  similarity  of  the  form  of  the  Fourier  transform  (2.52)  and  inverse  Fourier 
transform  (2.53),  we  can  see  that  the  roles  of  time  and  frequency  can  be  switched  simply  by 
negating  one  of  the  arguments.  In  particular,  suppose  that  u(t)  -H-  U(f).  Define  the  time 
domain  signal  s(t)  =  U(t),  replacing  /  by  t.  Then  the  Fourier  transform  of  s(t)  is  given  by 
S(f)  =  u(—f),  replacing  t  by  — /.  Since  negating  the  argument  corresponds  to  reflection  around 
the  origin,  we  can  simply  switch  time  and  frequency  for  signals  which  are  symmetric  around  the 
origin.  Applying  duality  to  the  Example  2.5.1,  we  infer  that  a  signal  that  is  ideally  bandlimited 
in  frequency  corresponds  to  a  sine  function  in  time: 

I[-w/2,w/2](f)  -H-  VFsinc(VFf) 

Relation  to  Fourier  series:  The  Fourier  transform  can  be  obtained  by  taking  the  limit  of  the 
Fourier  series  as  the  period  gets  large,  with  T0  — *  oo  and  /0  — *  0  (think  of  an  aperiodic  signal 
as  periodic  with  infinite  period).  We  do  not  provide  details,  but  sketch  the  process  of  taking 
this  limit:  T0Uk  tends  to  U(f),  where  /  =  kfo,  and  the  Fourier  series  sum  in  (2.42)  become  the 
inverse  Fourier  transform  integral  in  (2.53),  with  /0  becoming  df.  Not  surprisingly,  therefore,  the 
Fourier  transform  exhibits  properties  entirely  analogous  to  those  for  Fourier  series,  as  we  shall 
see  shortly.  However,  the  Fourier  transform  applies  to  a  broader  class  of  signals,  and  we  can 
take  advantage  of  time-frequency  duality  more  easily,  because  both  time  and  frequency  are  now 
continuous-valued  variables. 

Application  to  infinite  energy  signals:  In  engineering  applications,  we  routinely  apply  the 
Fourier  and  the  inverse  Fourier  transform  to  infinite  energy  signals,  even  though  the  derivation 
of  the  Fourier  transform  as  the  limit  of  a  Fourier  series  is  based  on  the  assumption  that  the 
signal  has  finite  energy.  While  infinite  energy  signals  are  not  physically  realizable,  they  are 
useful  approximations  of  finite  energy  signals,  often  simplifying  mathematical  manipulations. 
For  example,  instead  of  considering  a  sinusoid  over  a  large  time  interval,  we  can  consider  a 
sinusoid  of  infinite  duration.  As  we  shall  see,  this  leads  to  an  impulsive  function  in  the  frequency 
domain.  As  another  example,  delta  functions  in  the  time  domain  are  useful  in  modeling  the 
impulse  response  of  wireless  multipath  channels.  Basically,  once  we  are  willing  to  work  with 
impulses,  we  can  use  the  Fourier  transform  on  a  very  broad  class  of  signals. 


Example  2.5.2  The  delta  function  and  the  constant  function  form  a  Fourier  trans¬ 
form  pair:  For  u(t)  =  S(t),  we  have 

/OO 

5(t)e~j2irftdt  =  1 

-oo 


for  all  /.  That  is, 


5(t)  O  /(-oo,oo)(/) 
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Example  2.5.3  Complex  exponentials  in  the  time  domain  correspond  to  impulses  in 
frequency  domain:  Let  us  show  this  using  the  inverse  Fourier  transform.  For  a  frequency 
domain  impulse  at  /0,  U(f)  =  S(f  —  /o),  the  inverse  Fourier  transform  is  given  by 

/OO 

S(f  ~  fo)ej2”ftdf  =  e^ot 

-OO 

using  the  sifting  property  of  the  impulse.  That  is, 

ej2nfot  5(f  -  /o) 

Once  we  embrace  frequency  domain  impulses,  we  can  fold  Fourier  series  into  Fourier  transforms 
as  follows. 


Example  2.5.4  Fourier  series  expressed  in  terms  of  Fourier  transforms:  We  know  that 
a  periodic  signal  u(t)  with  period  T0  can  be  written  as 

OO 

u(t)  =  J2  Unej2nnfot 

n=— oo 

where  /o  =  1/To  is  the  fundamental  frequency  and  {un}  are  the  Fourier  series  coefficients.  Using 
Example  2.5.3  to  take  the  Fourier  transform  of  both  sides,  we  obtain 

OO 

U(f)  =  Y,  ~  n/o) 

n=— oo 

Thus,  the  Fourier  transform  of  a  periodic  signal  is  constituted  of  impulses  at  the  harmonics,  with 
coefficients  given  by  the  Fourier  series. 


Now  that  we  have  seen  both  the  Fourier  series  and  the  Fourier  transform,  it  is  worth  commenting 
on  the  following  frequently  asked  questions. 


What  do  negative  frequencies  mean?  Why  do  we  need  them?  Consider  a  real-valued 
sinusoid  Acos(2nf0t  +  6),  where  f0  >  0.  If  we  now  replace  /0  by  — /0,  we  obtain  A  cos(— 2nf0t  + 
6)  =  A  cos(27r/oi— 9),  using  the  fact  that  cosine  is  an  even  function.  Thus,  we  do  not  need  negative 
frequencies  when  working  with  real- valued  sinusoids.  However,  unlike  complex  exponentials,  real¬ 
valued  sinusoids  are  not  eigenfunctions  of  LTI  systems:  we  can  pass  a  cosine  through  an  LTI 
system  and  get  a  sine,  for  example.  Thus,  once  we  decide  to  work  with  a  basis  formed  by  complex 
exponentials,  we  do  need  both  positive  and  negative  frequencies  in  order  to  describe  all  signals 
of  interest.  For  example,  a  real-valued  sinusoid  can  be  written  in  terms  of  complex  exponentials 
as 


Hcos(2vr/0t  +  9)  =  j  [ej^fot+0)  +  e-j^fot+0) )  =  j ej0ej2nfot  +  ^e-^e~j2nfot 


so  that  we  need  complex  exponentials  at  both  +/0  and  — /o  to  describe  a  real-valued  sinusoid 
at  frequency  /0.  Of  course,  the  coefficients  multiplying  these  two  complex  exponentials  are  not 
arbitrary:  they  are  complex  conjugates  of  each  other.  More  generally,  as  we  have  already  seen, 
such  conjugate  symmetry  holds  for  both  Fourier  series  and  Fourier  transforms  of  real- valued 
signals.  We  can  therefore  state  the  following: 

(a)  We  do  need  both  positive  and  negative  frequencies  to  form  a  complete  basis  using  complex 
exponentials; 

(b)  For  real- valued  (i.e.,  physically  realizable)  signals,  the  expansion  in  terms  of  a  complex 
exponential  basis,  whether  it  is  the  Fourier  series  or  the  Fourier  transform,  exhibits  conjugate 
symmetry.  Hence,  we  only  need  to  know  the  Fourier  series  or  Fourier  transform  of  a  real-valued 
signal  for  positive  frequencies. 
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2.5.1  Fourier  Transform  Properties 


We  now  state  some  key  properties  of  the  Fourier  transform.  In  the  following,  u(t),  v(t)  denote 
signals  with  Fourier  transforms  [/(/),  V(f),  respectively. 

Linearity:  For  arbitrary  complex  numbers  a,  (3, 

au(t)  +  /3v(t)  -H-  aU(f)  +  j3V(f) 

Time  delay  corresponds  to  linear  phase  in  frequency  domain: 

u(t  -  t0)  o  U(f)e~j2nft0 

Frequency  shift  corresponds  to  modulation  by  complex  exponential: 

U(f  -  f0)  u(t)e ***** 

The  Fourier  transform  of  a  real-valued  signal  is  conjugate  symmetric:  If  u(t)  is  real¬ 
valued,  then  U(f)  =  U*  ( — /) . 

Differentiation  in  the  time  domain  amplifies  higher  frequencies: 

x(t)  =  jtu(t)  O  X(f)  =  j2nfU (/) 

As  for  Fourier  series,  differentiation  kills  the  DC  term,  i,.e,  X (0)  =  0.  However,  the  information 
at  all  other  frequencies  is  preserved.  Thus,  if  we  know  X(f)  then  we  can  recover  U(f)  for  f  ^  0 
as  follows: 


U(f)  =  ^4,  /  ^  0 
K  J  j2nf 


(2.54) 


This  specifies  the  Fourier  transform  almost  everywhere  (except  at  DC:  /  =  0).  If  U(f)  is  finite 
everywhere,  then  we  do  not  need  to  worry  about  its  value  at  a  particular  point,  and  can  leave 
1/(0)  unspecified,  or  define  it  as  the  limit  of  (2.54)  as  /  — >  0  (and  if  this  limit  does  not  exist, 
we  can  set  D(0)  to  be  the  left  limit,  or  the  right  limit,  or  any  number  in  between).  In  short,  we 
can  simply  adopt  (2.54)  as  the  expression  for  U(f)  for  all  /,  when  1/(0)  is  finite.  However,  the 
DC  term  does  matter  when  u(t)  has  a  nonzero  average  value,  in  which  case  we  get  an  impulse 
at  DC.  The  average  value  of  u(t)  is  given  by 

T 

1 

u  =  lim  —  /  u(t)dt 

T->o o  Tit 

and  has  Fourier  transform  given  by  u(t)  =  u  O  u5(f).  Thus,  we  can  write  the  overall  Fourier 
transform  as 

u(f]  =  +  «(/)  (2-55) 

We  illustrate  this  via  the  following  example. 

Example  2.5.5  (Fourier  transform  of  a  step  function)  Let  us  use  differentiation  to  com¬ 
pute  the  Fourier  transform  of  the  unit  step  function 


u{t)  = 


0,  t  <  0 
1,  t  >  0 
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Its  DC  value  is  given  by 


u  =  1/2 

and  its  derivative  is  the  delta  function  (see  Figure  2.21): 

x(t)  =  ^ u(t )  =  6{t)  O  X  (/)  =  1 

Applying  (2.55),  we  obtain  that  the  Fourier  transform  of  the  unit  step  function  is  given  by 


du/dt 


Figure  2.21:  The  unit  step  function  and  its  derivative,  the  delta  function. 


c(/)  =  D-  +  b(/) 

j2vr  /  2 

Parseval’s  identity  (inner  product/energy  can  be  computed  in  either  time  or  fre¬ 
quency  domain): 

/OO  POO 

u(t)v*  (t)dt  =  /  U  (f)V*(f)df 

■oo  J  —oo 

Setting  v  =  u,  we  get  an  expression  for  the  energy  of  a  signal: 

POO  POO 

IM|2=  /  \u(t)\2dt=  \U(f)\2df 


Next,  we  discuss  the  significance  of  the  Fourier  transform  in  understanding  the  effect  of  LTI 
systems. 

Transfer  function  for  an  LTI  system:  The  transfer  function  H(f)  of  an  LTI  system  is 
defined  to  be  the  Fourier  transform  of  its  impulse  response  h(t).  That  is,  H(f)  =  Jr(h(t)).  We 
now  discuss  its  significance. 

From  (2.37),  we  know  that,  when  the  input  to  an  LTI  system  is  the  complex  exponential  ej27r7ot, 
the  output  is  given  by  From  the  inverse  Fourier  transform  (2.53),  we  know  that 

any  input  u(t)  can  be  expressed  as  a  linear  combination  of  complex  exponentials.  Thus,  the 
corresponding  response,  which  we  know  is  given  by  y(t)  =  ( u*h)(t )  must  be  a  linear  combination 
of  the  responses  to  these  complex  exponentials.  Thus,  we  have 

/OO 

-oo 

We  recognize  that  the  preceding  function  is  in  the  form  of  an  inverse  Fourier  transform,  and 
read  off  Y(f)  =  U(f)H(f).  That  is,  the  Fourier  transform  of  the  output  is  simply  the  product 
of  the  Fourier  transform  of  the  input  and  the  system  transfer  function.  This  is  because  complex 
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exponentials  at  different  frequencies  propagate  through  an  LTI  system  without  mixing  with  each 
other,  with  a  complex  exponential  at  frequency  /  passing  through  with  a  scaling  of  H(f). 

Of  course,  we  have  also  derived  an  expression  for  y{t)  in  terms  of  a  convolution  of  the  input 
signal  with  the  system  impulse  response:  y(t)  —  (u  *  h)(t).  We  can  now  infer  the  following  key 
property. 

Convolution  in  the  time  domain  corresponds  to  multiplication  in  the  frequency  do¬ 
main 

1/(0  =  (u  *  h)(t)  Y(f)  =  U (f)H(f)  (2.56) 


We  can  also  infer  the  following  dual  property,  either  by  using  duality  or  by  directly  deriving  it 
from  first  principles. 


Multiplication  in  the  time  domain  corresponds  to  convolution  in  the  frequency  do¬ 
main 


y(t)  =  u(t)v(t )  ++  Y(f)  =  (U  »  V)(f) 


(2.57) 


LTI  system  response  to  real-valued  sinusoidal  signals:  For  a  sinusoidal  input  u(t)  = 
cos(27r  f0t  +  9),  the  response  of  an  LTI  system  h  is  given  by 


y(t)  =  (u*  h)(t )  =  \H(fo)\  cos(2vr/0t  +  9  +  / H(fo )) 

This  can  be  inferred  from  what  we  know  about  the  response  for  complex  exponentials,  thanks 
to  Euler’s  formula.  Specifically,  we  have 


—  I  (gjpnfot+e)  _|_  e-j(2nfot+6)^  _  IgjOgJ^/ot  _|_  \e-j6e-j2nfot 


When  u  goes  through  an  LTI  system  with  transfer  function  H(f),  the  output  is  given  by 

y(t)  =  ^ej9H{f0)ej2nfot  +  ^eH(-f0)e-^ot 


If  the  system  is  physically  realizable,  the  impulse  response  h(t)  is  real-valued,  and  the  transfer 
function  is  conjugate  symmetric.  Thus,  if  H(fo)  =  (G  >  0),  then  H(—f0 )  =  H*(fo )  = 

Ge~^.  Substituting,  we  obtain 


y(t)  = 


Gej(2nf0t+e+t)  +  ^e-i(2^/ot+e+0)  =  q  cos(2tt  f0t  +  0  +  0) 


This  yields  the  well-known  result  that  the  sinusoid  gets  scaled  by  the  magnitude  of  the  transfer 
function  G  =  \H(f0)\,  and  gets  phase  shifted  by  the  phase  of  the  transfer  function  0  =  /H(fo)- 


Example  2.5.6  (Delay  spread,  coherence  bandwidth,  and  fading  for  a  multipath 
channel)  The  transfer  function  of  a  multipath  channel  as  in  (2.36)  is  given  by 

H{f)  =  aie"i27r/Tl  +  ...  +  ame-j2nfT™  (2.58) 

Thus,  the  channel  transfer  function  is  a  linear  combination  of  complex  exponentials  in  the  fre¬ 
quency  domain.  As  with  any  sinusoids,  these  can  interfere  constructively  or  destructively,  leading 
to  significant  fluctuations  in  H(f)  as  /  varies.  For  wireless  channels,  this  phenomenon  is  called 
frequency-selective  fading.  Let  us  examine  the  structure  of  the  fading  a  little  further.  Suppose, 
without  loss  of  generality,  that  the  delays  are  in  increasing  order  (i.e.,  iq  <  7~2  <  ...  <  rm).  We 
can  then  rewrite  the  transfer  function  as 

m 

H{f)  =  e-j2irfT1J2ake~j27rf{Tk~Tl) 

k= 1 
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The  first  term  e-j27r^ri  corresponds  simply  to  a  pure  delay  T\  (seen  by  all  frequencies),  and  can 
be  dropped  (taking  T\  as  our  time  origin,  without  loss  of  generality),  so  that  the  transfer  function 
can  be  rewritten  as 

m 

H(f)  =  ake~j2lTf{Tk-Tl)  (2.59) 

k= 2 

The  period  of  the  Mil  sinusoid  above  (k  >  2)  is  1  /{jk  —  T\),  so  that,  the  smallest  period,  and 
hence  the  fastest  fluctuations  as  a  function  of  /,  occurs  because  of  the  largest  delay  difference 
Td  =  Tm  —  Ti,  which  we  call  the  channel  delay  spread.  Thus,  for  a  frequency  interval  which  is 
significantly  smaller  than  1/rd,  the  variation  of  \H(f)\  over  the  interval  is  small.  We  define  the 
channel  coherence  bandwidth  as  the  inverse  of  the  delay  spread,  i.e. ,  as  Bc  =  l/(rm  —  Ti)  = 
1  /rd  (this  definition  is  not  unique,  but  in  general,  the  coherence  bandwidth  is  defined  to  be 
inversely  proportional  to  some  appropriately  defined  measure  of  the  channel  delay  spread).  As 
we  have  noted,  H(f)  can  be  well  modeled  as  constant  over  intervals  significantly  smaller  than 
the  coherence  bandwidth. 

Let  us  apply  this  to  the  example  in  Figure  2.14,  where  we  have  a  multipath  channel  with  impulse 
response  h(t)  =  5(t  —  1)  —  0.5 5(t  —  1.5)  +  0.5<5(f  —  3.5).  Dropping  the  first  delay  as  before,  we 
have 

H(f)  =  1  -  0.5e~jnf  +  0.5e~j^f 

For  concreteness,  suppose  that  time  is  measured  in  microseconds  (typical  numbers  for  an  outdoor 
wireless  cellular  link),  so  that  frequency  is  measured  in  MHz.  The  delay  spread  is  2.5/xs,  hence 
the  coherence  bandwidth  is  400KHz.  We  therefore  ballpark  the  size  of  the  frequency  interval 
over  which  H(f)  can  be  approximated  as  constant  to  about  40KHz  (i.e.,  of  size  10%  of  the 
coherence  bandwidth).  Note  that  this  is  a  very  fuzzy  estimate:  if  the  larger  delays  occur  with 
smaller  relative  amplitudes,  as  is  typical,  then  they  have  a  smaller  effect  on  H(f),  and  we  could 
potentially  approximate  H(f)  as  constant  over  a  larger  fraction  of  the  coherence  bandwidth. 
Figure  2.22  depicts  the  fluctuations  in  H(f)  first  on  a  linear  scale,  and  then  on  a  log  scale.  A 
plot  of  the  transfer  function  magnitude  is  shown  in  Figure  2.22(a).  This  is  the  amplitude  gain  on 
a  linear  scale,  and  shows  significant  variations  as  a  function  of  /  (while  we  do  not  show  it  here, 
zooming  in  to  40  KHz  bands  shows  relatively  small  fluctuations).  The  amount  of  fluctuation 
becomes  even  more  apparent  on  a  log  scale.  Interpreting  the  gain  at  the  smallest  delay  (a^  =  1 
in  our  case)  as  that  of  a  nominal  channel,  the  fading  gain  is  defined  as  the  power  gain  relative 
to  this  nominal,  and  is  given  by  20 log10(|/7(/)|/|ai|)  in  decibels  (dB).  This  is  shown  in  Figure 
2.22(b).  Note  that  the  fading  gain  can  dip  below  -18  dB  in  our  example,  which  we  term  a  fade 
of  depth  18  dB.  If  we  are  using  a  “narrowband”  signal  which  has  a  bandwidth  small  compared  to 
the  coherence  bandwidth,  and  happen  to  get  hit  by  such  a  fade,  then  we  can  expect  much  poorer 
performance  than  nominal.  To  combat  this,  one  must  use  diversity.  For  example,  a  “wideband” 
signal  whose  bandwidth  is  larger  than  the  coherence  bandwidth  provides  frequency  diversity, 
while,  if  we  are  constrained  to  use  narrowband  signals,  we  may  need  to  introduce  other  forms  of 
diversity  (e.g.,  antenna  diversity  as  in  Software  Lab  2.2). 


2.5.2  Numerical  computation  using  DFT 

In  many  practical  settings,  we  do  not  have  nice  analytical  expressions  for  the  Fourier  or  in¬ 
verse  Fourier  transforms,  and  must  resort  to  numerical  computation,  typically  using  the  discrete 
Fourier  transform  (DFT).  The  DFT  of  a  discrete  time  sequence  {u[n],n  =  0, ...,  N  —  1}  of  length 
N  is  given  by 

N- 1 

U[m)  =  ^2u[n\e~j2nmn/N  ,  m  =  0,1,  ...,N  —  1.  (2.60) 

n= 0 
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(a)  Transfer  Function  Magnitude  (linear  scale) 


Figure  2.22:  Multipath  propagation  causes  severe  frequency-selective  fading. 


Matlab  is  good  at  doing  DFTs.  When  N  is  a  power  of  2,  the  DFT  can  be  computed  very 
efficiently,  and  this  procedure  is  called  a  Fast  Fourier  Transform  (FFT).  Comparing  (2.60)  with 
the  Fourier  transform  expression 


U(f) 


(2.61) 


we  can  view  the  sum  in  the  DFT  (2.60)  as  an  approximation  for  the  integral  in  (2.61)  under  the 
right  set  of  conditions.  Let  us  first  assume  that  u(t)  =  0  for  t  <  0:  any  waveform  which  can  be 
truncated  so  that  most  of  its  energy  falls  in  a  finite  interval  can  be  shifted  so  that  this  is  true. 
Next,  suppose  that  we  sample  the  waveform  with  spacing  ts  to  get 

u[n\  =  u(nts ) 

Now,  suppose  we  want  to  compute  the  Fourier  transform  £/(/)  for  /  =  mfSl  where  fs  is  the 
desired  frequency  resolution.  We  can  approximate  the  integral  for  the  Fourier  transform  by  a 
sum,  using  ts-spaced  time  samples  as  follows: 

poo 

U(mfs)  =  /  u(t)e-j2nmfstdt  ttJ2u(nts)e-j2™f°ntsts 


( dt  in  the  integral  is  replaced  by  the  sample  spacing  ts.)  Since  u[n]  =  u(nts),  the  approximation 
can  be  computed  using  the  DFT  formula  (2.60)  as  follows: 

U(mfs)  «  tsU[m\  (2.62) 

as  long  as  fsts  =  jj.  That  is,  using  a  DFT  of  length  N,  we  can  get  a  frequency  granularity  of 
fs  =  This  implies  that  if  we  choose  the  time  samples  close  together  (in  order  to  represent 
u(t)  accurately),  then  we  must  also  use  a  large  N  to  get  a  desired  frequency  granularity.  Often 
this  means  that  we  must  pad  the  time  domain  samples  with  zeros. 

Another  important  observation  is  that,  while  the  DFT  in  (2.60)  ranges  from  m  =  0, ...,  N  —  1,  it 
actually  computes  the  Fourier  transform  for  both  positive  and  negative  frequencies.  Noting  that 
ej2nmn/N  _  ej2n(-N+m)n/N ^  we  rea]jze  that  the  DFT  values  for  m  =  N/2,...:N  —  1  correspond 
to  the  Fourier  transform  evaluated  at  frequencies  (m  —  N)fs  =  —N/2fs,...,—fs.  The  DFT 
values  for  m  =  0,...,N/2  —  1  correspond  to  the  Fourier  transform  evaluated  at  frequencies 
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0,  fs, ...,  (N/2  —  1  )fs.  Thus,  we  should  swap  the  left  and  right  halves  of  the  DFT  output  in  order 
to  represent  positive  and  negative  frequencies,  with  DC  falling  in  the  middle.  Matlab  actually 
has  a  function,  fftshift,  that  does  this. 

Note  that  the  DFT  (2.60)  is  periodic  with  period  N,  so  that  the  Fourier  transform  approximation 
(2.62)  is  periodic  with  period  N fs  =  ±.  We  typically  limit  the  range  of  frequencies  over  which 

we  use  the  DFT  to  compute  the  Fourier  transform  to  the  fundamental  period  This  is 

consistent  with  the  sampling  theorem,  which  says  that  the  sampling  rate  1  jts  must  be  at  least  as 
large  as  the  size  of  the  frequency  band  of  interest.  (The  sampling  theorem  is  reviewed  in  Chapter 
4,  when  we  discuss  digital  modulation.) 


Example  2.5.7  (DFT-based  Fourier  transform  computation)  Suppose  that  we  want  to 
compute  the  Fourier  transform  of  the  sine  pulse  u(t)  =  sin7rt/[0ji](t).  The  Fourier  transform  for 
this  can  be  computed  analytically  (see  Problem  2.9)  to  be 


U(f) 


2  cost/  m 

7T(1  -  4/2) 


(2.63) 


Note  that  U(f)  has  a  0/0  form  at  /  =  1/2,  but  using  L’Hospital’s  rule,  we  can  show  that 
U(  1/2)  7^  0.  Thus,  the  first  zeros  of  U(f)  are  at  /  =  ±3/2.  This  is  a  timelimited  pulse  and 
hence  cannot  be  bandlimited,  but  U(f)  decays  as  l//2  for  /  large,  so  we  can  capture  most  of 
the  energy  of  the  pulse  within  a  suitably  chosen  finite  frequency  interval.  Let  us  use  the  DFT  to 
compute  U(f)  over  f  E  (—8,8).  This  means  that  we  set  l/(2ts)  =  8,  or  ts  =  1/16,  which  yields 
about  16  samples  over  the  interval  [0, 1]  over  which  the  signal  u(t)  has  support.  Suppose  now 
that  we  want  the  frequency  granularity  to  be  at  least  fs  =  1/160.  Then  we  must  use  a  DFT 
with  N  >  —C  =  2560  =  Nmin.  In  order  to  efficiently  compute  the  DFT  using  the  FFT,  we 
choose  N  =  4096,  the  next  power  of  2  at  least  as  large  as  Nmin.  Code  fragment  2.5.1  performs 
and  plots  this  DFT.  The  resulting  plot  (with  cosmetic  touches  not  included  in  the  code  below) 
is  displayed  in  Figure  2.23.  It  is  useful  to  compare  this  with  a  plot  obtained  from  the  analytical 
formula  (2.63),  and  we  leave  that  as  an  exercise. 


Figure  2.23:  Plot  of  magnitude  spectrum  of  sine  pulse  in  Example  2.5.7  obtained  numerically 
using  the  DFT. 


Code  Fragment  2.5.1  Numerical  computation  of  Fourier  transform  using  FFT 
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ts=l/16;  "/sampling  interval 

time_interval  =  0:ts:l;  "/sampling  time  instants 
°/o7otime  domain  signal  evaluated  at  sampling  instants 

signal_timedomain  =  sin(pi*time_interval) ;  7„sinusoidal  pulse  in  our  example 
fs_desired  =  1/160;  "/desired  frequency  granularity 

Nmin  =  ceil (1/ (f s_desired*ts) ) ;  "/minimum  length  DFT  for  desired  frequency  granularity 
"/for  efficient  computation,  choose  FFT  size  to  be  power  of  2 

Nfft  =  2~(nextpow2(Nmin))  °/FFT  size  =  the  next  power  of  2  at  least  as  big  as  Nmin 
"/Alternatively,  one  could  also  use  DFT  size  equal  to  the  minimum  length 
°/Nf  ft=Nmin; 

"/note:  fft  function  in  Matlab  is  just  the  DFT  when  Nfft  is  not  a  power  of  2 
°/„freq  domain  signal  computed  using  DFT 

°/0fft  function  of  size  Nfft  automatically  zeropads  as  needed 
signal_freqdomain  =  ts*f ft (signal_timedomain,Nf ft) ; 

"/fftshift  function  shifts  DC  to  center  of  spectrum 
signal_freqdomain_centered  =  fftshift (signal_freqdomain) ; 
f s=l/(Nfft*ts) ;  "/actual  frequency  resolution  attained 

°/„set  of  frequencies  for  which  Fourier  transform  has  been  computed  using  DFT 
freqs  =  ( (1 : Nfft) -1-Nf ft/2) *fs ; 

"/plot  the  magnitude  spectrum 

plot (freqs , abs (signal_f reqdomain_centered) ) ; 

xlabel ( ’ Frequency ’ ) ; 

ylabel( ’ Magnitude  Spectrum’); 


2.6  Energy  Spectral  Density  and  Bandwidth 

Communication  channels  have  frequency- dependent  characteristics,  hence  it  is  useful  to  appro¬ 
priately  shape  the  frequency  domain  characteristics  of  the  signals  sent  over  them.  Furthermore, 
for  wireless  communication  systems,  frequency  spectrum  is  a  particularly  precious  commodity, 
since  wireless  is  a  broadcast  medium  to  be  shared  by  multiple  signals.  It  is  therefore  important 
to  quantify  the  frequency  occupancy  of  communication  signals.  We  provide  a  first  exposure  to 
these  concepts  here  via  the  notion  of  energy  spectral  density  for  finite  energy  signals.  These 
ideas  are  extended  to  finite  power  signals,  for  which  we  can  define  the  analogous  concept  of 
power  spectral  density,  in  Chapter  4,  “just  in  time”  for  our  discussion  of  the  spectral  occupancy 
of  digitally  modulated  signals.  Once  we  know  the  energy  or  power  spectral  density  of  a  signal, 
we  shall  see  that  there  are  a  number  of  possible  definitions  of  bandwidth,  which  is  a  measure  of 
the  size  of  the  frequency  interval  occupied  by  the  signal. 


U(t) 
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Figure  2.24:  Operational  definition  of  energy  spectral  density. 


Energy  Spectral  Density:  The  energy  spectral  density  Eu(f)  of  a  signal  u(t)  can  be  defined 
operationally  as  shown  in  Figure  2.24.  Pass  the  signal  u(t)  through  an  ideal  narrowband  filter 
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with  transfer  function  as  follows: 


#/•(/)  = 


1, 

0, 


/*-¥</</*  +  ¥ 

else 


The  energy  spectral  density  Eu(f*)  is  defined  to  be  the  energy  at  the  output  of  the  filter,  divided 
by  the  width  A /  (in  the  limit  as  A /  — y  0).  That  is,  the  energy  at  the  output  of  the  filter  is 
approximately  Eu(f*)Af .  But  the  Fourier  transform  of  the  filter  output  is 


YU)  =  U(f)H(f)  =  { 


r- 

else 


¥</</• 


+  M. 

—  2 


By  Parseval’s  identity,  the  energy  at  the  output  of  the  filter  is 


\Y(f)\2df  = 


r/*  +  ¥ 


Mf)\z  df  *  mm2  a / 


A  / 


assuming  that  U(f)  varies  smoothly  and  A /  is  small  enough.  We  can  now  infer  that  the  energy 
spectral  density  is  simply  the  magnitude  squared  of  the  Fourier  transform: 


Eu(f)  =  \U{f)f 


(2.64) 


The  integral  of  the  energy  spectral  density  equals  the  signal  energy,  which  is  consistent  with 
Parseval’s  identity. 

The  inverse  Fourier  transform  of  the  energy  spectral  density  has  a  nice  intuitive  interpretation. 
Noting  that  |t/(/)|2  =  U(f)U*(f)  and  U*(f )  O  u*(—t ),  let  us  define  umf (t)  =  u*(—t)  as  (the 
impulse  response  of)  the  matched  filter  for  u(t),  where  the  reasons  for  this  term  will  be  clarified 
letter  Then 

\U(f)\2  =  U{f)U*{f)  o  (u*umf)(t)  =  f  u(t)uMp(r  -  t)dt 
=  f  u(t)u*(t  —  r)dt 


(2.65) 


where  t  is  a  dummy  variable  for  the  integration,  and  the  convolution  is  evaluated  at  the  time 
variable  r,  which  denotes  the  delay  between  the  two  versions  of  u  being  correlated:  the  extreme 
right-hand  side  is  simply  the  correlation  of  u  with  itself  (after  complex  conjugation),  evaluated 
at  different  delays  r.  We  call  this  the  autocorrelation  function  of  the  signal  u.  We  have  therefore 
shown  the  following. 

For  a  finite  energy  signal,  the  energy  spectral  density  and  the  autocorrelation  function  form  a 
Fourier  transform  pair. 

Bandwidth:  The  bandwidth  of  a  signal  u(t)  is  loosely  defined  to  be  the  size  of  the  band 
of  frequencies  occupied  by  U(f).  The  definition  is  “loose”  because  the  concept  of  occupancy 
can  vary,  depending  on  the  application,  since  signals  are  seldom  strictly  bandlimited.  One 
possibility  is  to  consider  the  band  over  which  |t/(/)|2  is  within  some  fraction  of  its  peak  value 
(setting  the  fraction  equal  to  |  corresponds  to  the  3  dB  bandwidth).  Alternatively,  we  might 
be  interested  in  energy  containment  bandwidth,  which  is  the  size  of  the  smallest  band  which 
contains  a  specified  fraction  of  the  signal  energy  (for  a  finite  power  signal,  we  define  analogously 
the  power  containment  bandwidth). 

Only  positive  frequencies  count  when  computing  bandwidth  for  physical  (real-valued) 
signals:  For  physically  realizable  (i.e. ,  real- valued)  signals,  bandwidth  is  defined  as  its  occupancy 
of  positive  frequencies,  because  conjugate  symmetry  implies  that  the  information  at  negative  fre¬ 
quencies  is  redundant. 

While  physically  realizable  time  domain  signals  are  real-valued,  we  shall  soon  introduce  complex¬ 
valued  signals  that  have  useful  physical  interpretation,  in  the  sense  that  they  have  a  well-defined 
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mapping  to  physically  realizable  signals.  Conjugate  symmetry  in  the  frequency  domain  does  not 
hold  for  complex-valued  time  domain  signals,  with  different  information  contained  in  positive 
and  negative  frequencies  in  general.  Thus,  the  bandwidth  for  a  complex-valued  signal  is  defined 
as  the  size  of  the  frequency  band  it  occupies  over  both  positive  and  negative  frequencies.  The 
justification  for  this  convention  becomes  apparent  later  in  this  chapter. 


Example  2.6.1  Some  bandwidth  computations 

(a)  Consider  u(t)  =  sinc(2t),  where  the  unit  of  time  is  microseconds.  Then  the  unit  of  frequency 

is  MHz,  and  U(f)  =  is  strictly  bandlimited  with  2  MHz. 

(b)  Now,  consider  the  timelimited  waveform  u(t)  =  Z[2,4](t),  where  the  unit  of  time  is  microsec¬ 
onds.  Then  U(f)  =  2sinc(2/)e_-?6,rf,  which  is  not  bandlimited.  The  99%  energy  containment 
bandwidth  W  is  defined  by  the  equation 

/W  poo  poo  p4 

\U{f)\2df  =  0.99  /  \U(f)\2df  =  0.99  /  \u{t)\2dt  =  0.99  /  l2dt  =  1.98 

W  J —oo  J —oo  J  2 

where  we  use  Parseval’s  identity  to  simplify  computation  for  timelimited  waveforms.  Using  the 
fact  that  |t/(/)|  is  even,  we  obtain  that 

rW  rW 

1.98  =  2  /  \U(f)\2df  =  2  /  4sinc2(2/)d/ 

J  o  Jo 

We  can  now  solve  numerically  to  obtain  W  ~  5.1  MHz. 


2.7  Baseband  and  Passband  Signals 

Baseband:  A  signal  u{t)  is  said  to  be  baseband  if  the  signal  energy  is  concentrated  in  a  band 
around  DC,  and 

U(f)  «  o,  |/|  >  W  (2.66) 

for  some  W  >  0.  Similarly,  a  channel  modeled  as  a  linear  time  invariant  system  is  said  to  be 
baseband  if  its  transfer  function  H(f)  has  support  concentrated  around  DC,  and  satisfies  (2.66). 

A  signal  u(t)  is  said  to  be  passband  if  its  energy  is  concentrated  in  a  band  away  from  DC,  with 

U(f)  «  0,  \f±fc\>W  (2.67) 

where  fc>  W  >  0.  A  channel  modeled  as  a  linear  time  invariant  system  is  said  to  be  passband 
if  its  transfer  function  H(f)  satisfies  (2.67). 

Examples  of  baseband  and  passband  signals  are  shown  in  Figures  2.25  and  2.26,  respectively. 
Physically  realizable  signals  must  be  real-valued  in  the  time  domain,  which  means  that  their 
Fourier  transforms,  which  can  be  complex- valued,  must  be  conjugate  symmetric:  U(—f)  = 
U*(f).  As  discussed  earlier,  the  bandwidth  B  for  a  real- valued  signal  u{t)  is  the  size  of  the 
frequency  interval  (counting  only  positive  frequencies)  occupied  by  U(f). 

Information  sources  typically  emit  baseband  signals.  For  example,  an  analog  audio  signal  has 
significant  frequency  content  ranging  from  DC  to  around  20  KHz.  A  digital  signal  in  which  zeros 
and  ones  are  represented  by  pulses  is  also  a  baseband  signal,  with  the  frequency  content  governed 
by  the  shape  of  the  pulse  (as  we  shall  see  in  more  detail  in  Chapter  4).  Even  when  the  pulse  is 
timelimited,  and  hence  not  strictly  bandlimited,  most  of  the  energy  is  concentrated  in  a  band 
around  DC. 
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Figure  2.25:  Example  of  the  spectrum  U(f)  for  a  real- valued  baseband  signal.  The  bandwidth 
of  the  signal  is  W. 
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Figure  2.26:  Example  of  the  spectrum  U(f)  for  a  real- valued  passband  signal.  The  bandwidth 
of  the  signal  is  W.  The  figure  shows  an  arbitrarily  chosen  frequency  fc  within  the  band  in  which 
U (/)  is  nonzero.  Typically,  fc  is  much  larger  than  the  signal  bandwidth  W. 
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Wired  channels  (e.g.,  telephone  lines,  USB  connectors)  are  typically  modeled  as  baseband:  the 
attenuation  over  the  wire  increases  with  frequency,  so  that  it  makes  sense  to  design  the  transmit¬ 
ted  signal  to  utilize  a  frequency  band  around  DC.  An  example  of  passband  communication  over 
a  wire  is  Digital  Subscriber  Line  (DSL),  where  high  speed  data  transmission  using  frequencies 
above  25  KHz  co-exists  with  voice  transmission  in  the  band  from  0-4  KHz.  The  design  and  use 
of  passband  signals  for  communication  is  particularly  important  for  wireless  communication,  in 
which  the  transmitted  signals  must  fit  within  frequency  bands  dictated  by  regulatory  agencies, 
such  as  the  Federal  Communication  Commission  (FCC)  in  the  United  States.  For  example,  an 
amplitude  modulation  (AM)  radio  signal  typically  occupies  a  frequency  interval  of  length  10  KHz 
somewhere  in  the  540-1600  KHz  band  allocated  for  AM  radio.  Thus,  the  baseband  audio  mes¬ 
sage  signal  must  be  transformed  into  a  passband  signal  before  it  can  be  sent  over  the  passband 
channel  spanning  the  desired  band.  As  another  example,  a  transmitted  signal  in  a  WiFi  network 
may  be  designed  to  fit  within  a  20  MHz  frequency  interval  in  the  2.4  GHz  unlicensed  band,  so 
that  digital  messages  to  be  sent  over  WiFi  must  be  encoded  onto  passband  signals  occupying  the 
designated  spectral  band. 


2.8  The  Structure  of  a  Passband  Signal 

In  order  to  employ  a  passband  channel  for  communication,  we  need  to  understand  how  to  design 
a  passband  transmitted  signal  to  carry  information,  and  how  to  recover  this  information  from  a 
passband  received  signal.  We  also  need  to  understand  how  the  transmitted  signal  is  affected  by 
a  passband  channel. 
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Figure  2.27:  A  baseband  message  of  bandwidth  W  is  translated  to  passband  by  multiplying  by 
a  sinusoid  at  frequency  fc,  as  long  as  fc  >  W. 


2.8.1  Time  Domain  Relationships 

Let  us  start  by  considering  a  real-valued  baseband  message  signal  m(t)  of  bandwidth  W,  to  be 
sent  over  a  passband  channel  centered  around  fc.  As  illustrated  in  Figure  2.27,  we  can  translate 
the  message  to  passband  simply  by  multiplying  it  by  a  sinusoid  at  fc: 

uP(t )  =  m(t)  cos2vr/ct  o  Up(f)  =  ^  (. M(f  -  fc)  +  M(f  +  fc)) 

We  use  the  term  carrier  frequency  for  /c,  and  the  term  carrier  for  a  sinusoid  at  the  carrier 
frequency,  since  the  modulated  sinusoid  is  “carrying”  the  message  information  over  a  passband 
channel.  Instead  of  a  cosine,  we  could  also  use  a  sine: 

vP(t)  =  m(t)  sin  27 rfct  O  Vp(f)  =  ^ -  (. M(f  -  fc)  -  M(f  +  fc)) 
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Note  that  \Up(f)\  and  \Vp(f)\  have  frequency  content  in  a  band  around  fc,  and  are  passband 
signals  (i.e.,  living  in  a  band  not  containing  DC)  as  long  as  fc  >  W . 

I  and  Q  components:  If  we  use  both  the  cosine  and  sine  carriers,  we  can  construct  a  passband 
signal  of  the  form 

up(t)  =  uc(t )  cos  2nfct  —  us(t)  sin  2tt  fct  (2.68) 

where  uc  and  us  are  real  baseband  signals  of  bandwidth  at  most  W,  with  fc  >  W.  The  signal 
uc(t)  is  called  the  in-phase  (or  I)  component,  and  us(t)  is  called  the  quadrature  (or  Q)  component. 
The  negative  sign  for  the  Q  term  is  a  standard  convention.  Since  the  sinusoidal  terms  are  entirely 
predictable  once  we  specify  fc,  all  information  in  the  passband  signal  up  must  be  contained  in 
the  I  and  Q  components.  Modulation  for  a  passband  channel  therefore  corresponds  to  choosing 
a  method  of  encoding  information  into  the  I  and  Q  components  of  the  transmitted  signal,  while 
demodulation  corresponds  to  extracting  this  information  from  the  received  passband  signal.  In 
order  to  accomplish  modulation  and  demodulation,  we  must  be  able  to  upconvert  from  baseband 
to  passband,  and  downconvert  from  passband  to  baseband,  as  follows. 
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Figure  2.28:  Upconversion  from  baseband  to  passband,  and  downconversion  from  passband  to 
baseband. 


Upconversion  and  downconversion:  Equation  (2.68)  immediately  tells  us  how  to  upconvert 
from  baseband  to  passband.  To  downconvert  from  passband  to  baseband,  consider 

2 up(t)  cos(27t  fct)  =  2 uc{t)  cos2  2nfct  —  2 us(t)  sin  2nfct  cos  2nfct 
=  uc(t)  +  uc(t)  cos47r/ct  —  us(t)  sin47r/ct 


The  first  term  on  the  extreme  right-hand  side  is  the  I  component  uc(t),  a  baseband  signal.  The 
second  and  third  terms  are  passband  signals  at  2/c,  which  we  can  get  rid  of  by  lowpass  filtering. 
Similarly,  we  can  obtain  the  Q  component  us(t)  by  lowpass  filtering  —  2up{t)  sin  2nfct.  Block 
diagrams  for  upconversion  and  downconversion  are  depicted  in  Figure  2.28.  Implementation  of 
these  operations  could,  in  practice,  be  done  in  multiple  stages,  and  requires  careful  analog  circuit 
design. 

We  now  dig  deeper  into  the  structure  of  a  passband  signal.  First,  can  we  choose  the  I  and  Q 
components  freely,  independent  of  each  other?  The  answer  is  yes:  the  I  and  Q  components 
provide  two  parallel,  orthogonal  “channels”  for  encoding  information,  as  we  show  next. 

Orthogonality  of  I  and  Q  channels:  The  passband  waveform  ap{t)  =  uc[t)  cos  2nfct  corre¬ 
sponding  to  the  I  component,  and  the  passband  waveform  bp(t )  =  us(t)  sin27r/ct  corresponding 
to  the  Q  component,  are  orthogonal.  That  is, 

(ap,  bp)  =  0  (2.69) 


Let 


x{t )  =  ap{t)bp{t )  =  uc(t)us(t)  cos27r/ctsin27r/ct 


-uc(t)us(t)  sin 47 rfct 
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We  prove  the  desired  result  by  showing  that  x{t)  is  a  passband  signal  at  2 fc,  so  that  its  DC 
component  is  zero.  That  is, 

POO 


x(t)dt  =  X(0)  =  0 


J  —  oo 

which  is  the  desired  result.  To  show  this,  note  that 

P(t)  =  7,uc(t)us(t)  ++  \{UC  *  Ua)(f ) 


is  a  baseband  signal:  if  Uc(f)  is  baseband  with  bandwidth  W\  and  Us(f )  is  baseband  with 
bandwidth  W2 ,  then  their  convolution  has  bandwidth  at  most  W\  +  W2.  In  order  for  ap  to  be 
passband,  we  must  have  fc  >  W i ,  and  in  order  for  bp  to  be  passband,  we  must  have  fc  >  W2. 
Thus,  2  fc  >  W\  +  W2,  which  means  that  x(t)  =  p(t)  sin47r/c£  is  passband  around  2/c,  and  is 
therefore  zero  at  DC.  This  completes  the  derivation. 


Example  2.8.1  (Passband  signal):  The  signal 

up(t)  =  I[o,i\(t)  cos3007rt  —  (1  —  |t | )T[_1;1] (t)  sin3007rf 

is  a  passband  signal  with  I  component  uc{t)  =  I{o,i](t)  and  Q  component  us(t)  =  (1  —  |t|)/[_iii](t). 
This  example  illustrates  that  we  do  not  require  strict  bandwidth  limitations  in  our  definitions 
of  passband  and  baseband:  the  I  and  Q  components  are  timclimited,  and  hence  cannot  be 
bandlimited.  However,  they  are  termed  baseband  signals  because  most  of  their  energy  lies  in 
baseband.  Similarly,  up{t)  is  termed  a  passband  signal,  since  most  of  its  frequency  content  lies 
in  a  small  band  around  150  Hz. 


Envelope  and  phase:  Since  a  passband  signal  up  is  equivalent  to  a  pair  of  real- valued  baseband 
waveforms  (uc,us),  passband  modulation  is  often  called  two-dimensional  modulation.  The  repre¬ 
sentation  (2.68)  in  terms  of  I  and  Q  components  corresponds  to  thinking  of  this  two-dimensional 
waveform  in  rectangular  coordinates  (the  “cosine  axis”  and  the  “sine  axis”).  We  can  also  rep¬ 
resent  the  passband  waveform  using  polar  coordinates.  Consider  the  rectangular-polar  transfor¬ 
mation 

e(t)  =  \/ u2c(t)  +  u2s{t)  ,  0(t)  =  tan^1 

where  e(t)  >  0  is  termed  the  envelope  and  6{t)  is  the  phase.  This  corresponds  to  uc(t)  = 
e{t)  cos  6{t)  and  us{t)  =  e{t)  sin  6{t).  Substituting  in  (2.68),  we  obtain 

up(t)  =  e(t)  cos  9{t)  cos  2nfct  —  e(t)  sin  9(t)  sin  2nfct  =  e(t)  cos  (27r fct  +  9(t))  (2.70) 

This  provides  an  alternate  representation  of  the  passband  signal  in  terms  of  baseband  envelope 
and  phase  signals. 

Complex  envelope:  To  obtain  a  third  representation  of  a  passband  signal,  we  note  that  a 
two-dimensional  point  can  also  be  mapped  to  a  complex  number;  see  Section  2.1.  We  define  the 
complex  envelope  u(t)  of  the  passband  signal  up(t)  in  (2.68)  and  (2.70)  as  follows: 

u{t)  =  uc(t)  +  jus{t)  =  e(t)e (2-71) 

We  can  now  express  the  passband  signal  in  terms  of  its  complex  envelope.  From  (2.70),  we  see 
that 

up(t)  =  e[t) Re  (ej{2nfct+m)  =  Re  (e(t)ej^fct+m)  =  Re  (e(f)eJ'0(Vw) 

This  leads  to  our  third  representation  of  a  passband  signal: 

up{t)  =  Re  (u(t)ej27rfct)  (2.72) 
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Figure  2.29:  Geometry  of  the  complex  envelope. 


While  we  have  obtained  (2.72)  using  the  polar  representation  (2.70),  we  should  also  check  that 
it  is  consistent  with  the  rectangular  representation  (2.68),  writing  out  the  real  and  imaginary 
parts  of  the  complex  waveforms  above  as  follows: 

u(t)ej2'*fct  =  ( uc(t )  +  jus(t))  (cos  2nfct  +  j  sin  2n  fct)  ^  ^ 

=  (uc(t)  cos  27T  fct  —  us(t )  sin  2nfct )  +  j  ( us(t )  cos  2nfct  +  uc(t)  sin  2nfct ) 

Taking  the  real  part,  we  obtain  the  expression  (2.68)  for  up(t). 

The  relationship  between  the  three  time  domain  representations  of  a  passband  signal  in  terms 
of  its  complex  envelope  is  depicted  in  Figure  2.29.  We  now  specify  the  corresponding  frequency 
domain  relationship. 

Information  resides  in  complex  baseband:  The  complex  baseband  representation  corre¬ 
sponds  to  subtracting  out  the  rapid,  but  predictable,  phase  variation  due  to  the  fixed  reference 
frequency  fc,  and  then  considering  the  much  slower  amplitude  and  phase  variations  induced  by 
baseband  modulation.  Since  the  phase  variation  due  to  fc  is  predictable,  it  cannot  convey  any 
information.  Thus,  all  the  information  in  a  passband  signal  is  contained  in  its  complex  envelope. 

Choice  of  frequency /phase  reference  is  arbitrary:  We  can  define  the  complex  baseband 
representation  of  a  passband  signal  using  an  arbitrary  frequency  reference  fc  (and  can  also  vary 
the  phase  reference),  as  long  as  we  satisfy  fc  >  W,  where  W  is  the  bandwidth.  We  may  often  wish 
to  transform  the  complex  baseband  representations  for  two  different  references.  For  example,  we 
can  write 

up(t)  =  uci{t)  cos(27r/it+6li)  —usi(t)  sin(27r/if+0i)  =  uc2(t)  cos(2tt f2t+ 9 2) -us2 (t)  sm(2ir f2t+92 ) 

We  can  express  this  more  compactly  in  terms  of  the  complex  envelopes  U\  =  uc\  +  jus i  and 
u2  =  uc  2  +  jus2. 

up(t)  =  Re  ( Ul(t)ej(27rflt+0l) )  =  Re  ( u2(t)ej(27rf2t+02) )  (2.74) 

We  can  now  find  the  relationship  between  these  complex  envelopes  by  transforming  the  expo¬ 
nential  term  for  one  reference  to  the  other: 

up(t)  =  Re  (Ul (t)ej{27rflt+9l))  =  Re  ([Ul(t)ej^fl-f (2.75) 

Comparing  with  the  extreme  right-hand  sides  of  (2.74)  and  (2.75),  we  can  read  off  that 

u2(t)  =  m  (t)e^h~h)t+6l-e2) 
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While  we  derived  this  result  using  algebraic  manipulations,  it  has  the  following  intuitive  interpre¬ 
tation:  if  the  instantaneous  phase  ‘Ik fit  +  6i  of  the  reference  is  ahead/behind,  then  the  complex 
envelope  must  be  correspondingly  retarded/advanced,  so  that  the  instantaneous  phase  of  the 
overall  passband  signal  stays  the  same.  We  illustrate  this  via  some  examples  below. 

Example  2.8.2  (Change  of  reference  frequency /phase)  Consider  the  passband  signal  up(t)  = 
/[_!,!](/)  COs4007rt. 

(a)  Find  the  output  when  up{t)  cos40l7rf  is  passed  through  a  lowpass  filter. 

(b)  Find  the  output  when  up{t)  sin(4007rf  —  j)  is  passed  through  a  lowpass  filter. 

Solution:  From  Figure  2.28,  we  recognize  that  both  (a)  and  (b)  correspond  to  downconversion 
operations  with  different  frequency  and  phase  references.  Thus,  by  converting  the  complex  en¬ 
velope  with  respect  to  the  appropriate  reference,  we  can  read  off  the  answers. 

(a)  Letting  v,\  =  uc i  +  jus i  denote  the  complex  envelope  with  respect  to  the  reference  e?40l7rt,  we 
recognize  that  the  output  of  the  LPF  is  uc i/2.  The  passband  signal  can  be  written  as 

up[t )  =  cos4007rt  =  Re  (t)ej4007Tt) 

We  can  now  massage  it  to  read  off  the  complex  envelope  for  the  new  reference: 

up(t)  =  Re  (/hlil] 

from  which  we  see  that  ui(t)  =  /[-m] (f)e~J7rt  =  7[_i)i](i)  (cosnt  —  jsimrt).  Taking  real  and 
imaginary  parts,  we  obtain  wcl(t)  =  /[_1>:L] (t)  cos Tit  and  usi(t)  =  — 7[_u] (t)  sin nt,  respectively. 
Thus,  the  LPF  output  is  cos7r t. 

(b)  Letting  u2  =  uc2  T  jus2  denote  the  complex  envelope  with  respect  to  the  reference  e^400nt~^\ 
we  recognize  that  the  output  of  the  LPF  is  —us2/2.  We  can  convert  to  the  new  reference  as 
before: 

up(t)  =  Re  (/[—i,!]  (t)e3^  e^400^^) 

which  gives  the  complex  envelope  u2  =  (cos  j+  j  sin  ^) .  Taking  real  and 

imaginary  parts,  we  obtain  uc2{t)  =  J[_1)1](f)  cos  f  and  us2(t)  =  sin  |,  respectively.  Thus, 

the  LPF  output  is  given  by  -us2/ 2  =  -^I[-ltl }(t)  sin  f  =  —^I[~i,i\(t). 

From  a  practical  point  of  view,  keeping  track  of  frequency /phase  references  becomes  important 
for  the  task  of  synchronization.  For  example,  the  carrier  frequency  used  by  the  transmitter  for 
upconversion  may  not  be  exactly  equal  to  that  used  by  the  receiver  for  downconversion.  Thus, 
the  receiver  must  compensate  for  the  phase  rotation  incurred  by  the  complex  envelope  at  the 
output  of  the  downconverter,  as  illustrated  by  the  following  example. 

Example  2.8.3  (Modeling  and  compensating  for  frequency /phase  offsets  in  complex 
baseband):  Consider  the  passband  signal  up  (2.68),  with  complex  baseband  representation 
u  =  uc  +  jus.  Now,  consider  a  phase-shifted  version  of  the  passband  signal 

up(t )  =  uc{t)  cos(2irfct  +  9(t ))  —  us(t)  sm(2nfct  +  9(t)) 

where  6(t)  may  vary  slowly  with  time.  For  example,  a  carrier  frequency  offset  A /  and  a  phase 
offset  7  corresponds  to  9(t )  =  2nAft  +  7.  Suppose,  now,  that  the  signal  is  downconverted  as 
in  Figure  2.28,  where  we  take  the  phase  reference  as  that  of  the  receiver’s  local  oscillator  (LO). 
How  do  the  I  and  Q  components  depend  on  the  phase  offset  of  the  received  signal  relative  to  the 
LO?  The  easiest  way  to  answer  this  is  to  find  the  complex  envelope  of  up  with  respect  to  fc.  To 
do  this,  we  write  up  in  the  standard  form  (2.70)  as  follows: 

Hp(t)  =  Re  (u(t)ej(2wfct+d(t))) 
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Comparing  with  the  desired  form 


up(t)  =  R  e{u(t)e^2rxfct) 


we  can  read  off 

u(t)  =  u(t)em)  (2.76) 

Equation  (2.76)  relates  the  complex  envelopes  before  and  after  a  phase  offset.  We  can  expand 
out  this  “polar  form”  representation  to  obtain  the  corresponding  relationship  between  the  I  and 
Q  components.  Suppressing  time  dependence  from  the  notation,  we  can  rewrite  (2.76)  as 

uc  +  jus  =  ( uc  +  jus)  (cos  9  +  j  sin  9) 

using  Euler’s  formula.  Equating  real  and  imaginary  parts  on  both  sides,  we  obtain 


uc  =  uc  cos  9  —  us  sin  9 
us  =  uc  sin  9  +  us  cos  9 


(2.77) 


The  phase  offset  therefore  results  in  the  I  and  Q  components  being  mixed  together  at  the  output 
of  the  downconverter.  Thus,  for  a  coherent  receiver  recovers  the  original  I  and  Q  components  uc, 
us,  we  must  account  for  the  (possibly  time  varying)  phase  offset  9{t).  In  particular,  if  we  have 
an  estimate  of  the  phase  offset,  then  we  can  undo  it  by  inverting  the  relationship  in  (2.76): 

i lit)  =  u(t)e~jm  (2.78) 

which  can  be  written  out  in  terms  of  real-valued  operations  as  follows: 

uc  =  uc  cos  9  +  £  sin  9 

~  ■  a  , ~  n  2-79 

us  =  —  uc  sm  9  +  us  cos  9 

The  preceding  computations  provide  a  typical  example  of  the  advantage  of  working  in  complex 
baseband.  Relationships  between  passband  signals  can  be  compactly  represented  in  complex 
baseband,  as  in  (2.76)  and  (2.78).  For  signal  processing  using  real- valued  arithmetic,  these 
complex  baseband  relationships  can  be  expanded  out  to  obtain  relationships  involving  real-valued 
quantities,  as  in  (2.77)  and  (2.79).  See  Software  Lab  2.1  for  an  example  of  such  computations. 


2.8.2  Frequency  Domain  Relationships 

Consider  an  arbitrary  comp  lex- valued  baseband  waveform  u(t)  whose  frequency  content  is  con¬ 
tained  in  [— W,  W],  and  suppose  that  fc  >  W.  We  want  to  show  that 

up(t)  =  Re  (n(t)ej27r7ci)  =  Re  (c(£))  (2.80) 

is  a  real-valued  passband  signal  whose  frequency  is  concentrated  around  ±/c,  away  from  DC.  Let 

c(t)  =  u{t)e**te  O  C(f)  =  U(f  -  fc)  (2.81) 

That  is,  C(f)  is  the  complex  envelope  U(f),  shifted  to  the  right  by  fc.  Since  U(f)  has  frequency 
content  in  [-W,  W ],  C(f)  has  frequency  content  around  [fc  —  W,fc  +  W].  Since  fc  —  W  >0,  this 
band  does  not  include  DC.  Now, 

ur(t)  =  Re  (c(t))  =  i  (c(t)  +  c'(i))  ++  ur(f)  =  1  (C(/)  +  C'(-f)) 


69 


Im(Up  (f)) 


-£ 

B 

....  f 

fc 

Re(C(f)) 


fc 


Im(C(f)) 


2B- 

f 

fc 

Im(U(f)) 


2B 

Figure  2.30:  Frequency  domain  relationship  between  a  real-valued  passband  signal  and  its  com¬ 
plex  envelope.  The  figure  shows  the  spectrum  Up(f )  of  the  passband  signal,  its  scaled  restriction 
to  positive  frequencies  C(/),  and  the  spectrum  [/(/)  of  the  complex  envelope. 
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Since  C*(—f)  is  the  complex  conjugated  version  of  C(f),  flipped  around  the  origin,  it  has  fre¬ 
quency  content  in  the  band  of  negative  frequencies  [— fc  —  W,  —fc  +  W]  around  —fc,  which  does 
not  include  DC  because  —fc  +  W  <  0.  Thus,  we  have  shown  that  up(t)  is  a  passband  signal,  ft 
is  real- valued  by  virtue  of  its  construction  using  the  time  domain  equation  (2.80),  which  involves 
taking  the  real  part.  But  we  can  also  doublecheck  for  consistency  in  the  frequency  domain: 
Up(f )  is  conjugate  symmetric,  since  its  positive  frequency  component  is  C(f),  and  its  nega¬ 
tive  frequency  component  is  C*(—f).  Substituting  C(f)  by  U(f  —  /c),  we  obtain  the  passband 
spectrum  in  terms  of  the  complex  baseband  spectrum: 

ur(f)  =  i  (U(J  -  fc)  +  U'(-f  -  fc))  (2.82) 

So  far,  we  have  seen  how  to  construct  a  real-valued  passband  signal  given  a  complex-valued 
baseband  signal.  To  go  in  reverse,  we  must  answer  the  following:  do  the  equivalent  representa¬ 
tions  (2.68),  (2.70),  (2.72)  and  (2.82)  hold  for  any  passband  signal,  and  if  so,  how  do  we  find 
the  spectrum  of  the  complex  envelope  given  the  spectrum  of  the  passband  signal?  To  answer 
these  questions,  we  simply  trace  back  the  steps  we  used  to  arrive  at  (2.82).  Given  the  spec¬ 
trum  Up(f)  for  a  real-valued  passband  signal  up(t),  we  construct  C(f)  as  a  scaled  version  of 
Up(f)  =  Up(f)I[ o,oo)(/))  the  positive  frequency  part  of  Up(f),  as  follows: 

cu)  =  2iy</)  =  |  fvW  • 

This  means  that  Up(f )  =  \C(f)  for  positive  frequencies.  By  the  conjugate  symmetry  of  Up(f), 
the  negative  frequency  component  must  be  |(7*(— /),  so  that  Up(f)  =  ^C(f)  +  /).  In  the 

time  domain,  this  corresponds  to 

Upit)  =  \c{t)  +  ^c*(t)  =  Re  (c(t))  (2.83) 

Now,  let  us  define  the  complex  envelope  as  follows: 

u(t)  =  c(t)e-^  O  U(f)  =  C(f  +  fc) 

Since  c(t)  =  w(t)ej27r^ct,  we  obtain  the  desired  relationship  (2.68)  on  substituting  into  (2.83). 
Since  C(f)  has  frequency  content  in  a  band  around  fc,  U(f),  which  is  obtained  by  shifting  C(f) 
to  the  left  by  fc,  is  indeed  a  baseband  signal  with  frequency  content  in  a  band  around  DC. 

Frequency  domain  expressions  for  I  and  Q  components:  If  we  are  given  the  time  domain 
complex  envelope,  we  can  read  off  the  I  and  Q  components  as  the  real  and  imaginary  parts: 

uc(t)  =  R e(u(t))  =  \  ( u(t )  +  u*(t)) 
us(t )  =  Im(u(t))  =  j-  (u(t)  -  u*(t )) 

Taking  Fourier  transforms,  we  obtain 

Uc{f)  =  \{U{f)  +  U*{-f)) 

Figure  2.30  shows  the  relation  between  the  passband  signal  Up(f),  its  scaled  version  C(f)  re¬ 
stricted  to  positive  frequencies,  and  the  complex  baseband  signal  U(f).  As  this  example  em¬ 
phasizes,  all  of  these  spectra  can,  in  general,  be  complex-valued.  Equation  (2.80)  corresponds  to 
starting  with  an  arbitrary  baseband  signal  U (/)  as  in  the  bottom  of  the  figure,  and  constructing 
C(f)  as  depicted  in  the  middle  of  the  figure.  We  then  use  C(f)  to  construct  a  conjugate  sym¬ 
metric  passband  signal  Up(f),  proceeding  from  the  middle  of  the  figure  to  the  top.  This  example 
also  shows  that  U(f)  does  not,  in  general,  obey  conjugate  symmetry,  so  that  the  baseband  signal 
u{t)  is,  in  general,  complex-valued.  However,  by  construction,  Up(f)  is  conjugate  symmetric,  and 
hence  the  passband  signal  up(t)  is  real- valued. 
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Example  2.8.4  Let  vp(t)  denote  a  real-valued  passband  signal,  with  Fourier  transform  Vp(f) 
specified  as  follows  for  negative  frequencies: 


v(n=  f  ~(f +  99)  —101  <  /  <  —99 
p[J)  \0  /  <  -101  or  -  99  <  /  <  0 

(a)  Sketch  Vp(f)  for  both  positive  and  negative  frequencies. 

(b)  Without  explicitly  taking  the  inverse  Fourier  transform,  can  you  say  whether  vp(t)  =  vp(—t) 
or  not? 

(c)  Find  and  sketch  Vc (/)  and  Vs(f),  the  Fourier  transforms  of  the  I  and  Q  components  with 
respect  to  a  reference  frequency  fc  =  99.  Do  this  without  going  to  the  time  domain. 

(d)  Find  an  explicit  time  domain  expression  for  the  output  when  vp(t)  cos  2007rt  is  passed  through 
an  ideal  lowpass  filter  of  bandwidth  4. 

(e)  Find  an  explicit  time  domain  expression  for  the  output  when  vp(t)  sin  2027rt  is  passed  through 
an  ideal  lowpass  filter  of  bandwidth  4. 

Solution: 


Figure  2.31:  Sketch  of  passband  spectrum  for  Example  2.8.4. 


(a)  Since  vp{t)  is  real-valued,  we  have  Vp  (/)  =  V*  (— /).  Since  the  spectrum  is  also  given  to  be 
real- valued  for  /  <  0,  we  have  V*  (— /)  =  Vp  (— /).  The  spectrum  is  sketched  in  Figure  2.31. 

(b)  Yes,  vp(t)  =  vp(—t).  Since  vp(t)  is  real-valued,  we  have  vp(—t)  =  v*(—t)  -H-  V*(f).  But 
V*(f)  =  Vp(f),  since  the  spectrum  is  real- valued. 

(c)  The  spectrum  of  the  complex  envelope  and  the  I  and  Q  components  are  shown  in  Figure  2.32. 
The  complex  envelope  is  obtained  as  V(f)  =  21/+(/  +  /c),  while  the  I  and  Q  components  satisfy 

w) = K(/) = m+ftn 

In  our  case,  Vc(f )  =  |/|/[-2,2](/)  and  jVs(f )  =  fl[~2,2](f)  are  real- valued,  and  are  plotted  in  the 
figure. 

(d)  The  output  of  the  LPF  is  vc(t)/ 2,  where  vc  is  the  I  component  with  respect  to  fc  =  100.  In 
Figure  2.33,  we  construct  the  complex  envelope  and  the  I  component  as  in  (c),  except  that  the 
reference  frequency  is  different.  Clearly,  the  boxcar  spectrum  corresponds  to  vc(t)  =  4sinc(2t), 
so  that  the  output  is  2sinc(2t). 

(e)  The  output  of  the  LPF  is  —vs(t)/ 2,  where  vs  is  the  I  component  with  respect  to  fc  =  101.  In 
Figure  2.34,  we  construct  the  complex  envelope  and  the  Q  component  as  in  (c),  except  that  the 
reference  frequency  is  different.  We  now  have  to  take  the  inverse  Fourier  transform,  which  is  a 
little  painful  if  we  do  it  from  scratch.  Instead,  let  us  differentiate  to  see  that 

i~dP~  =  I[~2’2]^  -  ^  4sinc(4t)  -  4 

But  dVad^  —j2ntvs(t),  so  that  j dVs}^  -H-  2irtvs(t).  We  therefore  obtain  that  2irtvs(t)  = 
4sinc(4t)  —  4,  or  vs(t)  =  .  Thus,  the  output  of  the  LPF  is  —vs(t)/ 2,  or  . 
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Figure  2.32:  Sketch  of  I  and  Q  spectra  in  Example  2.8.4(c),  taking  reference  frequency  fc  =  99. 


vc(f) 

A 


-1 

1 

Figure  2.33:  Finding  the  I  component  in  Example  2.8.4(d),  taking  reference  frequency  as  fc  = 
100. 
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Figure  2.34:  Finding  the  Q  component  in  Example  2.8.4(e),  taking  reference  frequency  as  fc  = 
101. 


Passband 


Complex  Baseband 


Figure  2.35:  The  relationship  between  passband  filtering  and  its  complex  baseband  analogue. 


Real  baseband  operations 


Figure  2.36:  Complex  baseband  realization  of  passband  filter.  The  constant  scale  factors  of  | 
have  been  omitted. 
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2.8.3  Complex  baseband  equivalent  of  passband  filtering 


We  now  state  another  result  that  is  extremely  relevant  to  transceiver  operations;  namely,  any 
passband  filter  can  be  implemented  in  complex  baseband.  This  result  applies  to  filtering  op¬ 
erations  that  we  desire  to  perform  at  the  transmitter  (e.g.,  to  conform  to  spectral  masks),  at 
the  receiver  (e.g.,  to  filter  out  noise),  and  to  a  broad  class  of  channels  modeled  as  linear  fil¬ 
ters.  Suppose  that  a  passband  signal  up(t)  =  uc(t )  cos27r fct  —  us(t )  sin  27 r/cf  is  passed  through 
a  passband  filter  with  impulse  response  hp(t)  =  hc(t )  cos2n  fct  —  hs(t )  sin 2nfct  to  get  an  output 
yp(t)  =  ( up  *  hp)(t).  In  the  frequency  domain,  Yp(f)  =  Hp(f)Up(f),  so  that  the  output  yp(t)  is 
also  passband,  and  can  be  written  as  yp(t)  =  yc(t )  cos27r/cf  —  ys(t)  sin  2nfct.  How  are  the  I  and  Q 
components  of  the  output  related  to  those  of  the  input  and  the  filter  impulse  response?  We  now 
show  that  a  compact  answer  is  given  in  terms  of  complex  envelopes:  the  complex  envelope  y  is  the 
convolution  of  the  complex  envelopes  of  the  input  and  the  impulse  response,  up  to  a  scale  factor. 
Let  y,  u  and  h  denote  the  complex  envelopes  for  yp,  uv  and  hp,  respectively,  with  respect  to  a 
common  frequency  reference  fc.  Since  real-valued  passband  signals  are  completely  characterized 
by  their  spectra  for  positive  frequencies,  the  passband  filtering  equation  Yp(f)  =  Up(f)Hp(f ) 
can  be  separately  (and  redundantly)  written  out  for  positive  and  negative  frequencies,  because 
the  waveforms  are  conjugate  symmetric  around  the  origin,  and  there  is  no  energy  around  /  =  0. 
Thus,  focusing  on  the  positive  frequency  segments  Y+(f )  =  Yp(f)I{f>0y,  U+(f)  =  f/p(/)/{/>0}, 
H+(f )  =  Hp(f)I{f>0},  we  have  Y+\f)  =  U+(f)H+(f),  from  which  we  conclude  that  the  complex 
envelope  of  y  is  given  by 

Y(f)  =  2  Y+(f  +  fc)  =  2  U+(f  +  fc)H+(f  +  fc)  =  l U(f)H(f ) 

Figure  2.35  depicts  the  relationship  between  the  passband  and  complex  baseband  waveforms 
in  the  frequency  domain,  and  supplies  a  pictorial  proof  of  the  preceding  relationship.  We  now 
restate  this  important  result  in  the  time  domain: 

V(t)  =  \(u*h)(t)  (2.84) 

A  practical  consequence  of  this  is  that  any  desired  passband  filtering  function  can  be  realized  in 
complex  baseband.  As  shown  in  Figure  2.36,  this  requires  four  real  baseband  Liters:  writing  out 
the  real  and  imaginary  parts  of  (2.84),  we  obtain 

yc=  ^(uc*hc-us*hs),  ys  =  ^(us*hc  +  uc*hs)  (2.85) 


Figure  2.37:  Convolution  of  two  boxes  for  Example  2.8.5. 


Example  2.8.5  The  passband  signal  u{t)  =  /[_i,i](t)  cosl007rt  is  passed  through  the  passband 
Liter  h(t)  =  I[o,3](t)  sin  1007rt.  Find  an  explicit  time  domain  expression  for  the  Liter  output. 
Solution:  We  need  to  Lnd  the  convolution  yp{t)  of  the  signal  up(t)  =  /[_i;i](f)  cos  1007rt  with  the 
impulse  response  hp(t)  =  I[o,3](t)  sin  1007rf,  where  we  have  inserted  the  subscript  to  explicitly 
denote  that  the  signals  are  passband.  The  corresponding  relationship  in  complex  baseband  is 
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y  =  (1/2 )u  *  h.  Taking  a  reference  frequency  fc  =  50,  we  can  read  off  the  complex  envelopes 
u{t)  =  7[_i;i](f)  and  h{t)  =  —  j/[0i3](f),  so  that 

V  =  (-3/ 2)/[-i,i](0  *  I[o,3](t) 

Let  s(t)  =  (l/2)/[_iii](t)  *  I[o,3](t)  denote  the  trapezoid  obtained  by  convolving  the  two  boxes,  as 
shown  in  Figure  2.37.  Then 

y(t)  =  - js(t ) 

That  is,  yc  —  0  and  ys  =  —s(t),  so  that  yp(t)  =  s(t )  sin  1007rf. 


2.8.4  General  Comments  on  Complex  Baseband 

Remark  2.8.1  (Complex  Baseband  in  Transceiver  Implementations)  Given  the  equiva¬ 
lence  of  passband  and  complex  baseband,  and  the  fact  that  key  operations  such  as  linear  filtering 
can  be  performed  in  complex  baseband,  it  is  understandable  why,  in  typical  modern  passband 
transceivers,  most  of  the  intelligence  is  moved  to  baseband  processing.  For  moderate  bandwidths 
at  which  analog-to-digital  and  digital-to-analog  conversion  can  be  accomplished  inexpensively, 
baseband  operations  can  be  efficiently  performed  in  DSP.  These  digital  algorithms  are  indepen¬ 
dent  of  the  passband  over  which  communication  eventually  occurs,  and  are  amenable  to  a  variety 
of  low-cost  implementations,  including  Very  Large  Scale  Integrated  Circuits  (VLSI),  Field  Pro¬ 
grammable  Gate  Arrays  (FPGA),  and  general  purpose  DSP  engines.  On  the  other  hand,  analog 
components  such  as  local  oscillators,  power  amplifiers  and  low  noise  amplifiers  must  be  opti¬ 
mized  for  the  bands  of  interest,  and  are  often  bulky.  Thus,  the  trend  in  modern  transceivers  is 
to  accomplish  as  much  as  possible  using  baseband  DSP  algorithms.  For  example,  complicated 
Liters  shaping  the  transmitted  waveform  to  a  spectral  mask  dictated  by  the  FCC  can  be  achieved 
with  baseband  DSP  algorithms,  allowing  the  use  of  relatively  sloppy  analog  filters  at  passband. 
Another  example  is  the  elimination  of  analog  phase  locked  loops  for  carrier  synchronization  in 
many  modern  receivers;  the  receiver  instead  employs  a  fixed  analog  local  oscillator  for  downcon- 
version,  followed  by  a  digital  phase  locked  loop,  or  a  one-shot  carrier  frequency /phase  estimate, 
implemented  in  complex  baseband. 

Energy  and  power:  The  energy  of  a  passband  signal  equals  that  of  its  complex  envelope,  up 
to  a  scale  factor  which  depends  on  the  particular  convention  we  adopt.  In  particular,  for  the 
convention  in  (2.68),  we  have 


(2.86) 


That  is,  the  energy  equals  the  sum  of  the  energies  of  the  I  and  Q  components,  up  to  a  scalar 
constant.  The  same  relationship  holds  for  the  powers  of  finite-power  passband  signals  and  their 
complex  envelopes,  since  power  is  computed  as  a  time  average  of  energy.  To  show  (2.86),  consider 


Up 1 1 2  =  f  (uc(t)  cos27t  fct  —  us(t)  sin  2-rr fct)2  dt 

=  f  u2c(t)  cos2(27r  fct)dt  +  f  u2(t )  sin2(27r  fct)dt  —  2  J  uc(t)  cos27t  fct  us(t )  sin2n  fctdt 


The  I-Q  cross  term  drops  out  due  to  I-Q  orthogonality,  so  that  we  are  left  with  the  I-I  and  Q-Q 
terms,  as  follows: 


u. 


=  /  u2c(t)  cos2 (2ir fct) dt  + 


u2{t)  sin2  (27t  fct)dt 
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Now,  cos2  2tt  fct  —  \  +  ~  cos47r  fct  and  sin2  2n  fct  =  \  |  cos  An  fct.  We  therefore  obtain 


\un  (= 


w2(t)dt  +  -  /  w2(f)df  +  -  /  u2c{t)  cos  An  fctdt  —  -  /  w2 (t)  cos  Anfctdt 


The  last  two  terms  are  zero,  since  they  are  equal  to  the  DC  components  of  passband  waveforms 
centered  around  2  fc,  arguing  in  exactly  the  same  fashion  as  in  our  derivation  of  I-Q  orthogonality. 
This  gives  the  desired  result  (2.86). 

Correlation  between  two  signals:  The  correlation,  or  inner  product,  of  two  real-valued 
passband  signals  up  and  vp  is  defined  as 

/OO 

Up(t)vp(t)dt 

-OO 


Using  exactly  the  same  reasoning  as  above,  we  can  show  that 

(rtp,  Up)  —  {(uc,  vf)  T  (us,ws)) 


(2.87) 


That  is,  we  can  implement  a  passband  correlation  by  first  downconverting,  and  then  employing 
baseband  operations:  correlating  I  against  I,  and  Q  against  Q,  and  then  summing  the  results.  It 
is  also  worth  noting  how  this  is  related  to  the  complex  baseband  inner  product,  which  is  defined 
as 

{u,v)  =  f^°oou(t)v*(t)dt  =  (uc(t)  +  jus(t))  (vc(t)  -  jvs(t)) 

=  ((uc,  vc)  +  (ua,  vs ))  +  j  ((ua,  vc)  -  (uc,  va)) 

Comparing  with  (2.87),  we  obtain  that 


(up,vp)  =  -R  e((u,v)) 

That  is,  the  passband  inner  product  is  the  real  part  of  the  complex  baseband  inner  product  (up  to 
scale  factor).  Does  the  imaginary  part  of  the  complex  baseband  inner  product  have  any  meaning? 
Indeed  it  does:  it  becomes  important  when  there  is  phase  uncertainty  in  the  downconversion 
operation,  which  causes  the  I  and  Q  components  to  leak  into  each  other.  However,  we  postpone 
discussion  of  such  issues  to  later  chapters. 


2.9  Wireless  Channel  Modeling  in  Complex  Baseband 

We  now  provide  a  glimpse  of  wireless  channel  modeling  using  complex  baseband.  There  are  two 
key  differences  between  wireless  and  wireline  communication.  The  first,  which  is  what  we  focus 
on  now,  is  multipath  propagation  due  to  reflections  off  of  scatterers  adding  up  at  the  receiver. 
This  addition  can  be  constructive  or  destructive  (as  we  saw  in  Example  2.5.6),  and  is  sensitive  to 
small  changes  in  the  relative  location  of  the  transmitter  and  receiver  which  produce  changes  in 
the  relative  delays  of  the  various  paths.  The  resulting  fluctuations  in  signal  strength  are  termed 
fading.  The  second  key  feature  of  wireless,  which  we  explore  in  a  different  wireless  module, 
is  interference:  wireless  is  a  broadcast  medium,  hence  the  receiver  can  also  hear  transmissions 
other  than  the  one  it  is  interested  in.  We  now  explore  the  effects  of  multipath  fading  for  some 
simple  scenarios.  While  we  just  made  up  the  example  impulse  response  in  Example  2.5.6,  we 
now  consider  more  detailed,  but  still  simplified,  models  of  the  propagation  environment  and  the 
associated  channel  models. 
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Consider  a  passband  transmitted  signal  at  carrier  frequency,  of  the  form 


up(t)  =  uc(t )  cos27r/ct  —  us(t )  sin  27r/cf  =  e(t)  cos(27t fct  +  9(t)) 

where 

n(f)  =  uc(t)  +jus(t)  =  e(f)eJ0W 

is  the  complex  baseband  representation,  or  complex  envelope.  In  order  to  model  the  propagation 
of  this  signal  through  a  multipath  environment,  let  us  consider  its  propagation  through  a  path 
of  length  r.  The  propagation  attenuates  the  held  by  a  factor  of  1/r,  and  introduces  a  delay  of 
r(r)  =  -,  where  c  denotes  the  speed  of  light.  Suppressing  the  dependence  of  r  on  r,  the  received 
signal  is  given  by 

^4 

vJt )  =  —e(t  —  t )  cos(27t  fc(t  —  r)  +  9(t  —  r)  +  0) 
r 

where  we  consider  relative  values  (across  paths)  for  the  constants  A  and  (ft.  The  complex  envelope 
of  vp(t)  with  respect  to  the  reference  ej27T^ct  is  given  by 

v(t)  =  —u(t  -  r)e-j(27r/cT+0)  (2.89) 

r 

For  example,  we  may  take  A  =  1,0  =  0  for  a  direct,  or  line  of  sight  (LOS),  path  from  transmitter 
to  receiver,  which  we  may  take  as  a  reference.  Figure  2.38  shows  the  geometry  of  for  a  reflected 
path  corresponding  to  a  single  bounce,  relative  to  the  LOS  path.  Follow  standard  terminology,  9t 
denotes  the  angle  of  incidence,  and  9g  =  \—9i  the  grazing  angle.  The  change  in  relative  amplitude 
and  phase  due  to  the  reflection  depends  on  the  carrier  frequency,  the  reflector  material,  the  angle 
of  incidence,  and  the  polarization  with  respect  to  the  orientation  of  the  reflector  surface.  Since 
we  do  not  wish  to  get  into  the  underlying  electromagnetics,  we  consider  simplified  models  of 
relative  amplitude  and  phase.  In  particular,  we  note  that  for  grazing  incidence  (9g  ~  0),  we  have 
A  1,  (ft  pe  7T. 


Reflector 


Range  R 

Figure  2.38:  Ray  tracing  for  a  single  bounce  path.  We  can  reflect  the  transmitter  around  the 
reflector  to  create  a  virtual  source.  The  line  between  the  virtual  source  and  the  receiver  tells  us 
where  the  ray  will  hit  the  reflector,  following  the  law  of  reflection  that  the  angles  of  incidence 
and  reflection  must  be  equal.  The  length  of  the  line  equals  the  length  of  the  reflected  ray  to  be 
plugged  into  (2.92). 

Generalizing  (2.89)  to  multiple  paths  of  length  ri,r2,...,  the  complex  envelope  of  the  received 
signal  is  given  by 

v(t)  =  ^  —u{t  -  (2.90) 
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where  rt  —  and  fa  depend  on  the  reflector  characteristic  and  incidence  angle  for  the  Ah 
ray.  This  corresponds  to  the  complex  baseband  channel  impulse  response 


h(t)  = 

i 


i  c-3{2nfcTi+<t>i) 
Ti 


n) 


(2.91) 


This  is  in  exact  correspondence  with  our  original  multipath  model  (2.36),  with  a,  =  — e  J(27r/cr*+^) 
The  corresponding  frequency  domain  response  is  given  by 

H(f)  =  Y  —e-j{2*f°Ti+<l,i)e-j2nfTi  (2.92) 

i 

i 

Since  we  are  modeling  in  complex  baseband,  /  takes  values  around  DC,  with  /  =  0  corresponding 
to  the  passband  reference  frequency  fc. 

Channel  delay  spread  and  coherence  bandwidth:  We  have  already  introduced  these  con¬ 
cepts  in  Example  2.5.6,  but  reiterate  them  here.  Let  rm;n  and  Tmax  denote  the  minimum  and 
maximum  of  the  delays  {rj}.  The  difference  rd  =  Tmax  —  rmjn  is  called  the  channel  delay  spread. 
The  reciprocal  of  the  delay  spread  is  termed  the  channel  coherence  bandwidth,  Bc  —  ^.  A  base¬ 
band  signal  of  bandwidth  W  is  said  to  be  narrowband  if  Wrd  =  W/Bc  <C  1,  or  equivalently,  if 
its  bandwidth  is  significantly  smaller  than  the  channel  coherence  bandwidth. 

We  can  now  infer  that,  for  a  narrowband  signal  around  the  reference  frequency,  the  received 
complex  baseband  signal  equals  a  delayed  version  of  the  transmitted  signal,  scaled  by  the  complex 
channel  gain 

h  =  H(  0)  =  Y  —e-j(2^+4>i)  (2.93) 


Example  2.9.1  (Two  ray  model)  Suppose  our  propagation  environment  consists  of  the  LOS 
ray  and  the  single  reflected  ray  shown  in  Figure  2.38.  Then  we  have  two  rays,  with  rq  = 
a J  R2  +  (hr  —  ht )2  and  r2  =  \/  R2  +  {hr  +  ht)2.  The  corresponding  delays  are  =  rj/c,  i  =  1,  2, 
where  c  denotes  the  speed  of  propagation.  The  grazing  angle  is  given  by  9g  =  tan-1  .  Setting 
A\  —  1  and  0i  =  0,  once  we  specify  A2  and  (j)2  for  the  reflected  path,  we  can  specify  the  complex 
baseband  channel.  Numerical  examples  are  explored  in  Problem  2.21,  and  in  Software  Lab  2.2. 


2.10  Concept  Summary 

In  addition  to  a  review  of  basic  signals  and  systems  concepts  such  as  convolution  and  Fourier 
transforms,  the  main  focus  of  this  chapter  is  to  develop  the  complex  baseband  representation  of 
passband  signals,  and  to  emphasize  its  crucial  role  in  modeling  and  implementation  of  commu¬ 
nication  systems. 

Review 

•  Euler’s  formula:  e^e  =  cos  9  +  j  sin  9 

•  Important  signals:  delta  function  (sifting  property),  indicator  function,  complex  exponential, 
sinusoid,  sine 

•  Signals  analogous  to  vectors:  Inner  product,  energy  and  norm 

•  LTI  systems:  impulse  response,  convolution,  complex  exponentials  as  eigenfunctions,  multipath 
channel  modeling 

•  Fourier  series:  complex  exponentials  or  sinusoids  as  basis  for  periodic  signals,  conjugate  sym¬ 
metry  for  real-valued  signals,  Parseval’s  identity,  use  of  differentiation  to  simplify  computation 
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•  Fourier  transform:  standard  pairs  (sine  and  boxcar,  impulse  and  constant),  effect  of  time  de¬ 
lay  and  frequency  shift,  conjugate  symmetry  for  real-valued  signals,  Parseval’s  identity,  use  of 
differentiation  to  simplify  computation,  numerical  computation  using  DFT 

•  Bandwidth:  for  physical  signals,  given  by  occupancy  of  positive  frequencies;  energy  spectral 
density  equals  magnitude  squared  of  Fourier  transform;  computation  of  fractional  energy  con¬ 
tainment  bandwidth  from  energy  spectral  density 


Complex  baseband  representation 

•  Complex  envelope  of  passband  signal:  rectangular  form  (I  and  Q  components),  polar  form 
(envelope  and  phase),  upconversion  and  downconversion,  orthogonality  of  1  and  Q  components 
(under  ideal  synchronization),  frequency  domain  relationship  between  passband  signal  and  its 
complex  envelope 

•  Passband  filtering  can  be  accomplished  in  complex  baseband 

•  Passband  inner  product  and  energy  in  terms  of  complex  baseband  quantities 

Modeling  in  complex  baseband 

•  Frequency  and  phase  offsets:  rotating  phasor  multiplying  complex  envelope,  derotation  to  undo 
offsets 

•  Wireless  multipath  channel:  impulse  response  modeled  as  sum  of  impulses  with  complex- valued 
coefficients,  ray  tracing,  delay  spread  and  coherence  bandwidth 


2.11  Endnotes 

A  detailed  treatment  of  the  material  reviewed  in  Sections  2. 1-2.5  can  be  found  in  basic  textbooks 
on  signals  and  systems  such  as  Oppenheim,  Willsky  and  Nawab  [17]  or  Lathi  [18]. 

The  Matlab  code  fragments  and  software  labs  interspersed  in  this  textbook  provide  a  glimpse  of 
the  use  of  DSP  in  communication.  However,  for  a  background  in  core  DSP  algorithms,  we  refer 
the  reader  to  textbooks  such  as  Oppenheim  and  Schafer  [19]  and  Mitra  [20]. 


Problems 


LTI  systems  and  Convolution 

Problem  2.1  A  system  with  input  x(t)  has  output  given  by 


y(t)  —  eu  tx(u)du 


(a)  Show  that  the  system  is  LTI  and  find  its  impulse  response. 

(b)  Find  the  transfer  function  H(f)  and  plot  \H(f)\. 

(c)  If  the  input  x(t)  =  2sinc(2t),  find  the  energy  of  the  output. 


Problem  2.2  Find  and  sketch  y  =  X\  *  x2  for  the  following: 

(a)  xi(t)  =  e-i/[0,oo)(£),  x2(t)  =  xi(-t). 

(b)  xi(t)  =  I[0,2](,t)  -  3/[ij4](£),  x2{t)  =  J[0>i](£). 

Hint:  I11  (b),  you  can  use  the  LTI  property  and  the  known  result  in  Figure  2.12  on  the  convolution 
of  two  boxes. 
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Fourier  Series 


Problem  2.3  A  digital  circuit  generates  the  following  periodic  waveform  with  period  0.5: 


u(t) 


1,  0<t<0.1 
0,  1  <t  <  0.5 


where  the  unit  of  time  is  microseconds  throughout  this  problem. 

(a)  Find  the  complex  exponential  Fourier  series  for  du/dt. 

(b)  Find  the  complex  exponential  Fourier  series  for  u(t),  using  the  results  of  (a). 

(c)  Find  an  explicit  time  domain  expression  for  the  output  when  u(t)  is  passed  through  an  ideal 
lowpass  filter  of  bandwidth  100  KHz. 

(d)  Repeat  (c)  when  the  filter  bandwidth  is  increased  to  300  KHz. 

(e)  Find  an  explicit  time  domain  expression  for  the  output  when  u(t)  is  passed  through  a  filter 
with  impulse  response  h2{t)  =  sinc(t)  cos(87t t). 

(f)  Can  you  generate  a  sinusoidal  waveform  of  frequency  1  MHz  by  appropriately  filtering  u(t)7 
If  so,  specify  in  detail  how  you  would  do  it. 


Fourier  Transform  and  Bandwidth 

Problem  2.4  Find  and  sketch  the  Fourier  transforms  for  the  following  signals: 

(a)  u(t)  =  (1  -  |t|)/[_i,i](t). 

(b)  v(t)  =  sine (2t) sine (it). 

(c)  s(t)  =  v(t)  cos2007rf. 

(d)  Classify  each  of  the  signals  in  (a)-(c)  as  baseband  or  passband. 

Problem  2.5  Use  Parseval’s  identity  to  compute  the  following  integrals: 

(a)  f^x)sinc2(2t)dt 

(b)  /0°°  sine (t) sine (2t)dt 


Problem  2.6  (a)  For  u(t)  =  sinc(t)  sinc(2£),  where  t  is  in  microseconds,  find  and  plot  the 
magnitude  spectrum  \U(f)\,  carefully  labeling  the  units  of  frequency  on  the  x  axis. 

(b)  Now,  consider  s(t)  =  u(t)  cos  2007t£.  Plot  the  magnitude  spectrum  |5'(/)|,  again  labeling 
the  units  of  frequency  and  carefully  showing  the  frequency  intervals  over  which  the  spectrum  is 
nonzero. 


Problem  2.7  The  signal  s(t)  =  sinc4f  is  passed  through  a  filter  with  impulse  response  h(t)  = 
sinc2f  cos  47t£  to  obtain  output  y(t).  Find  and  sketch  the  Fourier  transform  Y(f)  of  the  output 
(sketch  the  real  and  imaginary  parts  separately  if  the  spectrum  is  complex- valued) . 

Problem  2.8  Consider  the  tent  signal  s(t)  =  (1  —  |£|)/[_iii](£). 

(a)  Find  and  sketch  the  Fourier  transform  S(f). 

(b)  Compute  the  99%  energy  containment  bandwidth  in  KHz,  assuming  that  the  unit  of  time  is 
milliseconds. 


Problem  2.9  Consider  the  cosine  pulse 

P(t)  =  COS  7 Tt  /[-l/2,l/2](£) 
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(a)  Show  that  the  Fourier  transform  of  this  pulse  is  given  by 


P(f) 


2  COS  7T  f 

tt(1  -  4 f2) 


(b)  Use  this  result  to  derive  the  formula  (2.63)  for  the  sine  pulse  in  Example  2.5.7. 


Problem  2.10  (Numerical  computation  of  the  Fourier  transform)  Modify  Code  Frag¬ 
ment  2.5.1  for  Example  2.5.7  to  numerically  compute  the  Fourier  transform  of  the  tent  function 
in  Problem  2.8.  Display  the  magnitude  spectra  of  the  DFT-based  numerically  computed  Fourier 
transform  and  the  analytically  computed  Fourier  transform  (from  Problem  2.8)  in  the  same  plot, 
over  the  frequency  interval  [—10, 10].  Comment  on  the  accuracy  of  the  DFT-based  computation. 


Introducing  the  matched  filter 

Problem  2.11  For  a  signal  s(t),  the  matched  filter  is  defined  as  a  filter  with  impulse  response 
h[t)  =  smf(t)  =  s*(—t )  (we  allow  signals  to  be  complex  valued,  since  we  want  to  handle  complex 
baseband  signals  as  well  as  physical  real- valued  signals). 

(a)  Sketch  the  matched  filter  impulse  response  for  s(t)  =  7[ii3](t). 

(b)  Find  and  sketch  the  convolution  y(t)  =  (s  *  smf)(t).  This  is  the  output  when  the  signal  is 
passed  through  its  matched  filter.  Where  does  the  peak  of  the  output  occur? 

(c)  (True  or  False)  Y (/)  >  0  for  all  /. 

Problem  2.12  Repeat  Problem  2.11  for  s(t)  =  /[^(t)  —  2/p. 5]  (t). 


Introducing  delay  spread  and  coherence  bandwidth 

Problem  2.13  A  wireless  channel  has  impulse  response  given  by  h(t)  =  26 (t  —  0.1)  +  jS(t  — 
0.64)  —  0.8<5(t  —  2.2),  where  the  unit  of  time  is  in  microseconds. 

(a)  What  is  the  delay  spread  and  coherence  bandwidth? 

(b)  Plot  the  magnitude  and  phase  of  the  channel  transfer  function  //(/)  over  the  interval 
[— 2BC,  2BC],  where  Bc  denotes  the  coherence  bandwidth  computed  in  (a).  Comment  on  how 
the  phase  behaves  when  [//(/)[  is  small. 

(c)  Express  \H(f)\  in  dB,  taking  0  dB  as  the  gain  of  a  nominal  channel  hn0m(t)  =  26  (t  —  0.1) 
corresponding  to  the  first  ray  alone.  What  are  the  fading  depths  that  you  see  with  respect  to 
this  nominal? 

Define  the  average  channel  power  gain  over  a  band  \—Wj 2,  Wj 2]  as 

1  rw/2 

G(W)  —  —  \H(f)\2  df 

W  J-W/2 

This  is  a  simplified  measure  of  how  increasing  signal  bandwidth  W  can  help  compensate  for 
frequency-selective  fading:  we  hope  that,  as  W  gets  large,  we  can  average  out  fluctuations  in 

(d)  Plot  G(W)  as  a  function  of  W/BC}  and  comment  on  how  large  the  bandwidth  needs  to  be 
(as  a  multiple  of  Bc )  to  provide  “enough  averaging.” 


Complex  envelope  of  passband  signals 

Problem  2.14  Consider  a  passband  signal  of  the  form 

up(t)  =  aft )  cos2007rt 
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where  a(t)  =  sinc(2£),  and  where  the  unit  of  time  is  in  microseconds. 

(a)  What  is  the  frequency  band  occupied  by  up(t)l 

(b)  The  signal  up{t)  cos  1997rt  is  passed  through  a  lowpass  filter  to  obtain  an  output  b{t).  Give 
an  explicit  expression  for  b(t),  and  sketch  B(f)  (if  B(f)  is  complex- valued,  sketch  its  real  and 
imaginary  parts  separately). 

(c)  The  signal  up(t)  sin  1997rf  is  passed  through  a  lowpass  filter  to  obtain  an  output  c{t).  Give 
an  explicit  expression  for  c(i),  and  sketch  C(f)  (if  C(f)  is  complex- valued,  sketch  its  real  and 
imaginary  parts  separately). 

(d)  Can  you  reconstruct  a(t)  from  simple  real- valued  operations  performed  on  b(t)  and  c(f)?  If 
so,  sketch  a  block  diagram  for  the  operations  required.  If  not,  say  why  not. 


2  sin(400rc  t+Jt/4) 


Figure  2.39:  Operations  involved  in  Problem  2.15. 


Problem  2.15  Consider  the  signal  s(t)  =  l)_i.i](f)  cos4007rf. 

(a)  Find  and  sketch  the  baseband  signal  u(t)  that  results  when  s(t)  is  downconverted  as  shown 
in  the  upper  branch  of  Figure  2.39. 

(b)  The  signal  s(t)  is  passed  through  the  bandpass  filter  with  impulse  response  h{t)  =  J[0)i]  (t)  sin(4007ri+ 
^).  Find  and  sketch  the  baseband  signal  v(t)  that  results  when  the  hlter  output  y{t)  =  (s  *  h){t) 

is  downconverted  as  shown  in  the  lower  branch  of  Figure  2.39. 

Problem  2.16  Consider  the  signals  U\(t)  =  d[o,i](t)  cos  1007rt  and  u2(t)  =  I{o,i](t)  sin  1007rt. 

(a)  Find  the  numerical  value  of  the  inner  product  J^°ooUi(t)u2(t)dt. 

(b)  Find  an  explicit  time  domain  expression  for  the  convolution  y{t)  =  [u\  *  u2){t). 

(c)  Sketch  the  magnitude  spectrum  \Y(f)\  for  the  convolution  in  (b). 


Problem  2.17  Consider  a  real- valued  passband  signal  vp(t)  whose  Fourier  transform  for  positive 
frequencies  is  given  by 


f  2,  30  <  /  <  32 
MVP(f))  =  {  0,  0  <  /  <  30 
I  0,  32  <  /  <  oo 


Im  (Vp(f)) 


1  —  |/  —  32|,  31  <  /  <  33 
0,  0  <  /  <  31 

0,  33  <  f  <  oo 
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(a)  Sketch  the  real  and  imaginary  parts  of  Vp(f)  for  both  positive  and  negative  frequencies. 

(b)  Specify,  in  both  the  time  domain  and  the  frequency  domain,  the  waveform  that  you  get  when 
you  pass  vp(t)  cos(607rt)  through  a  low  pass  filter. 

Problem  2.18  The  passband  signal  u(t)  =  cos  1007rf  is  passed  through  the  passband 

filter  h(t)  =  Z[o,3](t)  sin  1007t£.  Find  an  explicit  time  domain  expression  for  the  filter  output. 

Problem  2.19  Consider  the  passband  signal  up(t)  =  sinc(f)  cos207rt,  where  the  unit  of  time  is 
in  microseconds. 

(a)  Use  Matlab  to  plot  the  signal  (plot  over  a  large  enough  time  interval  so  as  to  include  “most” 
of  the  signal  energy).  Label  the  units  on  the  time  axis. 

Remark:  Since  you  will  be  plotting  a  discretized  version,  the  sampling  rate  you  should  choose 
should  be  large  enough  that  the  carrier  waveform  looks  reasonably  smooth  (e.g.,  a  rate  of  at  least 
10  times  the  carrier  frequency). 

(b)  Write  a  Matlab  program  to  implement  a  simple  downconverter  as  follows.  Pass  x(t)  = 
2up(t)  cos207rf  through  a  lowpass  filter  which  consists  of  computing  a  sliding  window  average 

over  a  window  of  1  microsecond.  That  is,  the  LPF  output  is  given  by  y(t)  =  J*  x  x(r)  dr.  Plot 
the  output  and  comment  on  whether  it  is  what  you  expect  to  see. 


Problem  2.20  Consider  the  following  two  passband  signals: 


up{t)  =  sinc(2f)  cos  1007rf 


and 


7I\ 


vp(t)  =  sinc(t)  sin(10l7rf  +  — ) 


(a)  Find  the  complex  envelopes  u{t)  and  v{t)  for  up  and  vp,  respectively,  with  respect  to  the 
frequency  reference  fc  =  50. 

(b)  What  is  the  bandwidth  of  up{t)l  What  is  the  bandwidth  of  vp[t)l 

(c)  Find  the  inner  product  (up,vp),  using  the  result  in  (a). 

(d)  Find  the  convolution  yp(t)  =  (up  *  vp)(t),  using  the  result  in  (a). 


Wireless  channel  modeling 

Problem  2.21  Consider  the  two-ray  wireless  channel  model  in  Example  2.9.1. 

(a)  Show  that,  as  long  as  the  range  R  htlhr  the  delay  spread  is  well  approximated  as 

2  hthr 


where  c  denotes  the  propagation  speed.  We  assume  free  space  propagation  with  c  =  3  x  108m/s. 

(b)  Compare  the  approximation  in  (a)  with  the  actual  value  of  the  delay  spread  for  R  =  200m, 
ht  =  2m,  hr  =  10m.  (e.g.,  modeling  an  outdoor  link  with  LOS  and  single  ground  bounce). 

(c)  What  is  the  coherence  bandwidth  for  the  numerical  example  in  (b). 

(d)  Redo  (b)  and  (c)  for  R  =  10m,  ht  =  hr  =  2m  (e.g.,  a  model  for  an  indoor  link  modeling  LOS 
plus  a  single  wall  bounce). 


Problem  2.22  Consider  R  =  200m,  ht  =  2m,  hr  =  10m  in  the  two-ray  wireless  channel  model 
in  Example  2.9.1.  Assume  Al  —  1  and  <f>i  =  0,  set  A2  =  0.95  and  02  =  ?r,  and  assume  that  the 
carrier  frequency  is  5  GHz. 

(a)  Specify  the  channel  impulse  response,  normalizing  the  LOS  path  to  unit  gain  and  zero  delay. 
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Make  sure  you  specify  the  unit  of  time  being  used. 

(b)  Plot  the  magnitude  and  phase  of  the  channel  transfer  function  over  [— 3BC,  3BC],  where  Bc 
denotes  the  channel  coherence  bandwidth. 

(c)  Plot  the  frequency  selective  fading  gain  in  dB  over  [— 3BC,3BC],  using  a  LOS  channel  as 
nominal.  Comment  on  the  fading  depth. 

(d)  As  in  Problem  2.13,  compute  the  frequency-averaged  power  gain  G{W)  and  plot  it  as  a 
function  of  W/Bc.  How  much  bandwidth  is  needed  to  average  out  the  effects  of  frequency- 
selective  fading? 


Software  Lab  2.0:  Signals  and  Systems  Computations  using  Matlab 

Lab  Objectives:  The  goal  of  this  lab  is  to  gain  familiarity  with  computations  and  plots  with 
Matlab,  and  to  reinforce  key  concepts  in  signals  and  systems.  The  questions  are  chosen  to 
illustrate  how  we  can  emulate  continuous  time  operations  using  the  discrete  time  framework 
provided  by  Matlab. 

Reading:  Sections  2.2,  2.3,  2.5  (basic  material  on  signals  and  systems). 

Laboratory  Assignment 


Functions  and  Plots 

1  (a)  Write  a  Matlab  function  signalx  that  evaluates  the  following  signal  at  an  arbitrary  set  of 
points: 

(  2et+2,  -3  <  t  <  — 1 

x(t)  =  <  2e_tcos27 r£,  —  1  <  t  <  4 

[  0,  else 

That  is,  given  an  input  vector  of  time  points,  the  function  should  give  an  output  vector  with  the 
values  of  x  evaluated  at  those  time  points.  For  time  points  falling  outside  [—3,4],  the  function 
should  return  the  value  zero. 

(b)  Use  the  function  signalx  to  plot  x(t)  versus  t,  for  —  6  <  t  <  6.  To  do  this,  create  a  vector 
of  sampling  times  spaced  closely  enough  to  get  a  smooth  plot.  Generate  a  corresponding  vector 
using  signalx.  Then  plot  one  against  the  other. 

(c)  Use  the  function  signalx  to  plot  x{t  —  3)  versus  t. 

(d)  Use  the  function  signalx  to  plot  x(3  —  t)  versus  t. 

(e)  Use  the  function  signalx  to  plot  x{2 1)  versus  t. 


Convolution 

2(a)  Write  a  Matlab  function  contconv  that  computes  an  approximation  to  continuous-time 
convolution  as  follows. 

Inputs:  Vectors  xx  and  x2  representing  samples  of  two  signals  to  be  convolved.  Scalars  t  \ .  t2 
and  dt,  representing  the  starting  time  for  the  samples  of  xi,  the  starting  time  for  the  samples  in 
x2,  and  the  spacing  of  the  samples. 

Outputs:  Vectors  y  and  t,  corresponding  to  the  samples  of  the  convolution  output  and  the 
sampling  times. 

(b)  Check  that  your  function  works  by  using  it  to  convolve  two  boxes,  3/[_2_i]  and  4/[1j3],  to  get 
a  trapezoid  (e.g.,  using  the  following  code  fragment): 

dt=0 . 01 ; % sample  spacing 

si  =  -2:dt:-l;  %sampling  times  over  the  interval  [-2,-1] 
s2=  l:dt:3;  °/0  samp  ling  times  over  the  interval  [1,3] 


85 


xl=3*ones (length(sl) , 1) ;  "/samples  for  first  box 
x2=4*ones (length(s2) , 1) ;  "/samples  for  second  box 
[y,t]  =  contconv (xl ,x2 , si (1) , s2(l) ,dt) ; 
f igure(l) ; 
plot (t  ,y) ; 

Check  that  the  trapezoid  you  get  spans  the  correct  interval  (based  on  the  analytical  answer)  and 
has  the  correct  scaling. 

Matched  filter 

3(a)  Consider  the  signal  u{t)  =  2/[lj3](f)— 3/[2,4](f).  Plot  u(t)  and  its  matched  filter  umf(t )  =  u(—t ) 
on  the  same  plot. 

(b)  Use  the  function  contconv  to  convolve  u(t)  and  Plot  the  result  of  the  convolution. 

Where  is  the  peak  of  the  signal? 

(c)  Now,  consider  a  complex-valued  signal  s(t)  =  u(t )  +  jv(t),  where  v(t)  =  /[-i,2](t)  +  2/[0,i](f). 
The  matched  filter  is  given  by  smf(t)  =  s*(—t).  Plot  the  real  parts  of  s(t)  and  smf(t)  on  one 
plot,  and  the  imaginary  parts  on  another. 

(d)  Use  the  function  contconv  to  convolve  s(t)  and  smf(t).  Plot  the  real  part,  the  imaginary  part, 
and  the  magnitude  of  the  output.  Do  you  see  a  peak? 

(e)  Now,  use  the  function  contconv  to  convolve  si(t)  =  s(t  —  to)e^6  and  smf(t),  for  t0  =  2  and 
9  =  ^  Plot  the  real  part,  the  imaginary  part,  and  the  magnitude  of  the  output.  Do  you  see  a 
peak? 

(f)  If  you  did  not  know  t0  and  9,  could  you  estimate  it  from  the  output  of  the  convolution  in  (e)? 
Try  out  some  ideas  and  report  on  the  results. 

Fourier  transform 

The  following  Matlab  function  is  a  modification  of  Code  Fragment  2.5.1. 


function  [X,f,df]  =  contFT(x,tstart ,dt ,df_desired) 

%Use  Matlab  DFT  for  approximate  computation  of  continuous  time  Fourier 
"/transform 
% INPUTS 

°/0x  =  vector  of  time  domain  samples,  assumed  uniformly  spaced 

%tstart=  time  at  which  first  sample  is  taken 

%dt  =  spacing  between  samples 

%df _desired  =  desired  frequency  resolution 

“/oOUTPUTS 

°/0X=vector  of  samples  of  Fourier  transform 

%f corresponding  vector  of  frequencies  at  which  samples  are  obtained 

°/0df=freq  resolution  attained  (redundant — already  available  from 

°/0diff erence  of  consecutive  entries  of  f) 

0,0/ 0,0/ 0,0/ 0,0/0, 

/o  /o  /o  /o  /o  /o  /o  /o  /o 

"/(minimum  FFT  size  determined  by  desired  freq  res  or  length  of  x 
Nmin=max(ceil (1/ (df _desired*dt) ) , length(x) ) ; 

°/„choose  FFT  size  to  be  the  next  power  of  2 
Nfft  =  2" (nextpow2(Nmin)) 

°/0compute  Fourier  transform,  centering  around  DC 
X=dt*fftshift(fft(x,Nfft)) ; 

°/0achieved  frequency  resolution 
df=l/(Nfft*dt) 

%range  of  frequencies  covered 

f  =  ( (0 : Nf ft-1) -Nf ft/2) *df ;  "/same  as  f=-l/(2*dt)  rdf  :  l/(2*dt)  -  df 
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%phase  shift  associated  with  start  time 

X=X . *exp (-j *2*pi*f *tstart) ; 

end 


4(a)  Use  the  function  contFT  to  compute  the  Fourier  transform  of  s(t )  =  3sinc(2£  —  3),  where 
the  unit  of  time  is  a  microsecond,  the  signal  is  sampled  at  the  rate  of  16  MHz,  and  truncated 
to  the  range  [—8,  8]  microseconds.  We  wish  to  attain  a  frequency  resolution  of  1  KHz  or  better. 
Plot  the  magnitude  of  the  Fourier  transform  versus  frequency,  making  sure  you  specify  the  units 
on  the  frequency  axis.  Check  that  the  plot  conforms  to  your  expectations. 

(b)  Plot  the  phase  of  the  Fourier  transform  obtained  in  (a)  versus  frequency  (again,  make  sure 
the  units  on  the  frequency  axis  are  specified).  What  is  the  range  of  frequencies  over  which  the 
phase  plot  has  meaning? 

Matched  filter  in  frequency  domain 

5(a)  Consider  the  signal  s(t)  in  3(c).  Assuming  that  the  unit  of  time  is  a  millisecond  and  the 
desired  frequency  resolution  is  1  Hz,  use  the  function  contFT  to  compute  and  plot  IS^/)!- 

(b)  Use  the  function  contFT  to  compute  and  plot  the  magnitude  of  the  Fourier  transform  of  the 
convolution  s  *  smf  numerically  computed  in  3(d).  Also  plot  for  comparison  |5(/)|2,  using  the 
output  of  5(a).  The  two  plots  should  match. 

(c)  Plot  the  phase  of  the  Fourier  transform  of  s  *  smf  obtained  in  5(b).  Comment  on  whether 
the  plot  matches  your  expectations. 

Lab  Report 

•  Discuss  the  results  you  obtain,  answer  any  specific  questions  that  are  asked,  and  print  out 
the  most  useful  plots  to  support  your  answers. 

•  Append  your  programs  to  the  report.  Make  sure  you  comment  them  in  enough  detail  so 
they  are  easy  to  understand.  In  addition  to  the  functions  you  are  asked  to  write,  label  the 
code  fragments  used  for  each  assigned  segment  (1  through  5)  separately. 

•  Write  a  paragraph  about  any  questions  or  confusions  that  you  may  have  experienced  with 
this  lab. 


Software  Lab  2.1:  Modeling  Carrier  Phase  Uncertainty 

Lab  Objectives:  The  goal  of  this  lab  is  to  explore  modeling  and  receiver  operations  in  complex 
baseband,  In  particular,  we  model  and  undo  the  effect  of  carrier  phase  mismatch  between  the 
receiver  LO  and  the  incoming  carrier. 

Reading:  Section  2.8  (complex  baseband  basics). 


Laboratory  Assignment 


Consider  a  pair  of  independently  modulated  signals,  uc(t)  =  J2n=i  —  n )  and  us(t)  = 

^2n= i  bs[n}p(t  —  n),  where  the  symbols  bc[n\,  bs[n ]  are  chosen  with  equal  probability  to  be  +1  and 
-1,  and  p(t)  =  I[o,i](t)  is  a  rectangular  pulse.  Let  N  =  100. 

(1.1)  Use  Matlab  to  plot  a  typical  realization  of  uc(t)  and  us(t)  over  10  symbols.  Make  sure  you 
sample  fast  enough  for  the  plot  to  look  reasonably  “nice.” 

(1.2)  Upconvert  the  baseband  waveform  uc(t)  to  get 

uV) \(t)  =  uc{t )  cos407rt 
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This  is  a  so-called  binary  phase  shift  keyed  (BPSK)  signal,  since  the  changes  in  phase  clue  to 
the  changes  in  the  signs  of  the  transmitted  symbols.  Plot  the  passband  signal  uPy\{t)  over  four 
symbols  (you  will  need  to  sample  at  a  multiple  of  the  carrier  frequency  for  the  plot  to  look  nice, 
which  means  you  might  have  to  go  back  and  increase  the  sampling  rate  beyond  what  was  required 
for  the  baseband  plots  to  look  nice). 

(1.3)  Now,  add  in  the  Q  component  to  obtain  the  passband  signal 

up(t)  =  uc(t )  cos407rt  —  us(t)  sin407rf 

Plot  the  resulting  Quaternary  Phase  Shift  Keyed  (QPSK)  signal  up(t)  over  four  symbols. 

(1.4)  Downconvert  up(t)  by  passing  2 up(t)  cos(407rf  +  8)  and  2 up(t)  sin (407rf  +  6)  through  crude 
lowpass  filters  with  impulse  response  h{t)  =  I[o,o.25](t)-  Denote  the  resulting  I  and  Q  components 
by  vc(t)  and  vs(t),  respectively.  Plot  vc  and  vs  for  6  =  0  over  10  symbols.  How  do  they  compare 
to  uc  and  usl  Can  you  read  off  the  corresponding  bits  bc[n]  and  bs[n]  from  eyeballing  the  plots 
for  vc  and  vs7 

(1.5)  Plot  vc  and  vs  for  8  =  7t/4.  How  do  they  compare  to  uc  and  us7  Can  you  read  off  the 
corresponding  bits  bc[n\  and  bs[n)  from  eyeballing  the  plots  for  vc  and  vsl 

(1.6)  Figure  out  how  to  recover  uc  and  us  from  vc  and  vs  if  a  genie  tells  you  the  value  of  8  (we  are 
looking  for  an  approximate  reconstruction-the  LPFs  used  in  downconversion  are  non-ideal,  and 
the  original  waveforms  are  not  exactly  bandlimited).  Check  whether  your  method  for  undoing 
the  phase  offset  works  for  8  =  7t/4,  the  scenario  in  (1.5).  Plot  the  resulting  reconstructions  uc  and 
us,  and  compare  them  with  the  original  I  and  Q  components.  Can  you  read  off  the  corresponding 
bits  bc[n]  and  bs[n\  from  eyeballing  the  plots  for  uc  and  us7 

Lab  Report 

•  Answer  all  questions  and  print  out  the  most  useful  plots  to  support  your  answers. 

•  Write  a  paragraph  about  any  questions  or  confusions  that  you  may  have  experienced  with 
this  lab. 


Software  Lab  2.2:  Modeling  a  lamppost  based  broadband  network 

Lab  Objectives:  The  goal  of  this  lab  is  to  illustrate  how  wireless  multipath  channels  can  be 
modeled  in  complex  baseband. 

Reading:  The  background  for  this  lab  is  provided  in  Section  2.9,  which  discusses  wireless  channel 
modeling.  This  material  should  be  reviewed  prior  to  doing  the  lab. 

Laboratory  Assignment 

Consider  a  lamppost-based  network  supplying  broadband  access  using  unlicensed  spectrum  at  5 
GHz.  Figure  2.40  shows  two  kinds  of  links:  lamppost-to-lamppost  for  backhaul,  and  lamppost- 
to-mobile  for  access,  where  we  show  nominal  values  of  antenna  heights  and  distances.  We  explore 
simple  channel  models  for  each  case,  consisting  only  of  the  direct  path  and  the  ground  reflection. 
For  simplicity,  assume  throughout  that  A\  =  1,  =  0  for  the  direct  path,  and  A2  =  0.98,  (j)2  =  vr 

for  the  ground  reflection  (we  assume  a  phase  shift  of  tt  for  the  reflected  ray  even  though  it  may 
not  be  at  grazing  incidence,  especially  for  the  lamppost  to  mobile  link). 

(2.1)  Find  the  delay  spread  and  coherence  bandwidth  for  the  lamppost-to-lamppost  link.  If  the 
message  signal  has  20  MHz  bandwidth,  is  it  “narrowband”  with  respect  to  this  channel? 

(2.2)  Repeat  item  (2.1)  for  the  lamppost-to-car  link  when  the  car  is  100  m  away  from  each 
lamppost. 

Fading  and  diversity  for  the  backhaul  link 


Direct  path  (200  m) 


LamPP°stl  Lamppost  2 


Figure  2.40:  Links  in  a  lamppost-based  network. 


First,  let  us  explore  the  sensitivity  of  the  lamppost  to  lamppost  link  to  variations  in  range  and 
height.  Fix  the  height  of  the  transmitter  on  lamppost  1  at  10  m.  Vary  the  height  of  the  receiver 
on  lamppost  2  from  9.5  to  10.5  m. 

(2.3)  Letting  hnom  denote  the  nominal  channel  gain  between  two  lampposts  if  you  only  consider 
the  direct  path  and  h  the  net  complex  gain  including  the  reflected  path,  plot  the  normalized 
power  gain  in  dB,  201ogin  ,  as  a  function  of  the  variation  in  the  receiver  height.  Comment 
on  the  sensitivity  of  channel  quality  to  variations  in  the  receiver  height. 

(2.4)  Modeling  the  variations  in  receiver  height  as  coming  from  a  uniform  distribution  over 

(9.5. 10.5) ,  find  the  probability  that  the  normalized  power  gain  is  smaller  than  -20  dB?  (i.e.,  that 
we  have  a  fade  in  signal  power  of  20  dB  or  worse). 

(2.5)  Now,  suppose  that  the  transmitter  has  two  antennas,  vertically  spaced  by  25  cm,  with 
the  lower  one  at  a  height  of  10  m.  Let  h\  and  denote  the  channels  from  the  two  antennas 
to  the  receiver.  Let  hnom  be  defined  as  in  item  (2.3).  Plot  the  normalized  power  gains  in  dB, 

20  log10  | ^  | ,  i  —  1,2.  Comment  on  whether  or  not  both  gains  dip  or  peak  at  the  same  time. 

(2.6)  Plot  20  login  ma*(lfell’lfe2l)  w}1icj1  is  the  normalized  power  gain  you  would  get  if  you  switched 

to  the  transmit  antenna  which  has  the  better  channel.  This  strategy  is  termed  switched  diversity. 

(2.7)  Find  the  probability  that  the  normalized  power  gain  of  the  switched  diversity  scheme  is 
smaller  than  -20  dB. 

(2.8)  Comment  on  whether,  and  to  what  extent,  diversity  helped  in  combating  fading. 

Fading  on  the  access  link 

Consider  the  access  channel  from  lamppost  1  to  the  car.  Let  hnom(D )  denote  the  nominal  channel 
gain  from  the  lamppost  to  the  car,  ignoring  the  ground  reflection.  Taking  into  account  the  ground 
reflection,  let  the  channel  gain  be  denoted  as  h(D).  Here  D  is  the  distance  of  the  car  from  the 
bottom  of  lamppost  1,  as  shown  in  Figure  2.40. 

(2.9)  Plot  \hnom\  and  \h\  as  a  function  of  D  on  a  dB  scale  (an  amplitude  a  is  expressed  on  the  dB 
scale  as  201og10a).  Comment  on  the  “long-term”  variation  due  to  range,  and  the  “short-term” 
variation  due  to  multipath  fading. 

Lab  Report 


•  Answer  all  questions  and  print  out  the  most  useful  plots  to  support  your  answers. 

•  Write  a  paragraph  about  any  questions  or  confusions  that  you  may  have  experienced  with 
this  lab. 
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Chapter  3 

Analog  Communication  Techniques 


Modulation  is  the  process  of  encoding  information  into  a  signal  that  can  be  transmitted  (or 
recorded)  over  a  channel  of  interest.  In  analog  modulation,  a  baseband  message  signal,  such 
as  speech,  audio  or  video,  is  directly  transformed  into  a  signal  that  can  be  transmitted  over 
a  designated  channel,  typically  a  passband  radio  frequency  (RF)  channel.  Digital  modulation 
differs  from  this  only  in  the  following  additional  step:  bits  are  encoded  into  baseband  message 
signals,  which  are  then  transformed  into  passband  signals  to  be  transmitted.  Thus,  despite 
the  relentless  transition  from  digital  to  analog  modulation,  many  of  the  techniques  developed  for 
analog  communication  systems  remain  important  for  the  digital  communication  systems  designer, 
and  our  goal  in  this  chapter  is  to  study  an  important  subset  of  these  techniques,  using  legacy 
analog  communication  systems  as  examples  to  reinforce  concepts. 

From  Chapter  2,  we  know  that  passband  signals  carry  information  in  their  complex  envelope, 
and  that  the  complex  envelope  can  be  represented  either  in  terms  of  I  and  Q  components,  or  in 
terms  of  envelope  and  phase.  We  study  two  broad  classes  of  techniques:  amplitude  modula¬ 
tion,  in  which  the  analog  message  signal  appears  directly  in  the  I  and/or  Q  components;  and 
angle  modulation,  in  which  the  analog  message  signal  appears  directly  in  the  phase  or  in  the 
instantaneous  frequency  (i.e. ,  in  the  derivative  of  the  phase),  of  the  transmitted  signal.  Examples 
of  analog  communication  in  space  include  AM  radio,  FM  radio,  and  broadcast  television,  as  well 
as  a  variety  of  specialized  radios.  Examples  of  analog  communication  in  time  (i.e.,  for  storage) 
include  audiocassettes  and  VHS  videotapes. 

The  analog-centric  techniques  covered  in  this  chapter  include  envelope  detection,  superhetero¬ 
dyne  reception,  limiter  discriminators,  and  phase  locked  loops.  At  a  high  level,  these  techniques 
tell  us  how  to  go  from  baseband  message  signals  to  passband  transmitted  signals,  and  back 
from  passband  received  signals  to  baseband  message  signals.  For  analog  communication,  this 
is  enough,  since  we  consider  continuous  time  message  signals  which  are  directly  transformed  to 
passband  through  amplitude  or  angle  modulation.  For  digital  communication,  we  need  to  also 
figure  out  how  to  decode  the  encoded  bits  from  the  received  passband  signal,  typically  after  down- 
conversion  to  baseband;  this  is  a  subject  discussed  in  later  chapters.  However,  between  encoding 
at  the  transmitter  and  decoding  at  the  receiver,  a  number  of  analog  communication  techniques 
are  relevant:  for  example,  we  need  to  decide  between  direct  and  superheterodyne  architectures 
for  upconversion  and  downconversion,  and  tailor  our  frequency  planning  appropriately;  we  may 
use  a  PLL  to  synthesize  the  local  oscillator  frequencies  at  the  transmitter  and  receiver;  and 
the  basic  techniques  for  mapping  baseband  signals  to  passband  remain  the  same  (amplitude 
and/or  angle  modulation).  In  addition,  while  many  classical  analog  processing  functionalities 
are  replaced  by  digital  signal  processing  in  modern  digital  communication  transceivers,  when  we 
push  the  limits  of  digital  communication  systems,  in  terms  of  lowering  power  consumption  or 
increasing  data  rates,  it  is  often  necessary  to  fall  back  on  analog-centric,  or  hybrid  digital-analog, 
techniques.  This  is  because  the  analog-to-digital  conversion  required  for  digital  transceiver  im- 
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plementations  may  often  be  too  costly  or  power-hungry  for  ultra  high-speed,  or  ultra  low-power, 
implementations. 

Chapter  Plan:  After  a  quick  discussion  of  terminology  and  notation  in  Section  3.1,  we  discuss 
various  forms  of  amplitude  modulation  in  Section  3.2,  including  bandwidth  requirements  and  the 
tradeoffs  between  power  efficiency  and  simplicity  of  demodulation.  We  discuss  angle  modulation 
in  Section  3.3,  including  the  relation  between  phase  and  frequency  modulation,  the  bandwidth 
of  angle  modulated  signals,  and  simple  suboptimal  demodulation  strategies. 

The  superheterodyne  up/downconversion  architecture  is  discussed  in  Section  3.4,  and  the  design 
considerations  illustrated  via  the  example  of  analog  AM  radio.  The  phase  locked  loop  (PLL) 
is  discussed  in  Section  3.5,  including  discussion  of  applications  such  as  frequency  synthesis  and 
FM  demodulation,  linearized  modeling  and  analysis,  and  a  glimpse  of  the  insights  provided  by 
nonlinear  models.  Finally,  as  a  historical  note,  we  discuss  some  legacy  analog  communication 
systems  in  Section  3.6,  mainly  to  highlight  some  of  the  creative  design  choices  that  were  made 
in  times  when  sophisticated  digital  signal  processing  techniques  were  not  available.  This  last 
section  can  be  skipped  if  the  reader’s  interest  is  limited  to  learning  analog-centric  techniques  for 
digital  communication  system  design. 

Software:  Software  Lab  3.1  reinforces  concepts  in  amplitude  modulation,  and  shows  how  en¬ 
velope  detection,  used  for  analog  amplitude  modulation,  actually  remains  relevant  for  downcon- 
version  for  systems  where  we  are  pushing  the  limits  in  terms  of  carrier  frequency  (e.g.,  coherent 
optical  communication).  Angle  modulation  is  explored  further  in  Software  Lab  3.2,  which  in¬ 
cludes  an  introduction  to  a  digital  communication  technique  based  on  angle  modulation. 


3.1  Terminology  and  notation 

Message  Signal:  In  the  remainder  of  this  chapter,  the  analog  baseband  message  signal  is 
denoted  by  m(t).  Depending  on  convenience  of  exposition,  we  shall  think  of  this  message  as 
either  finite  power  or  finite  energy.  In  practice,  any  message  we  would  encounter  in  practice 
would  have  finite  energy  when  we  consider  a  finite  time  interval.  However,  when  modeling 
transmissions  over  long  time  intervals,  it  is  useful  to  think  of  messages  as  finite  power  signals 
spanning  an  infinite  time  interval.  On  the  other  hand,  when  discussing  the  effect  of  the  message 
spectrum  on  the  spectrum  of  the  transmitted  signal,  it  may  be  convenient  to  consider  a  finite 
energy  message  signal.  Since  we  consider  physical  message  signals,  the  time  domain  signal  is  real¬ 
valued,  so  that  its  Fourier  transform  (defined  for  a  finite  energy  signal)  is  conjugate  symmetric: 
M(/)  =  For  a  finite  power  (infinite  energy)  message,  recall  from  Chapter  2  that  the 

power  is  defined  as  a  time  average  in  the  limit  of  an  infinite  observation  interval,  as  follows: 

_  1  rT° 

m?  =  lim  —  /  m2(t)dt 

T0— too  To  J q 

Similarly,  the  DC  value  is  defined  as 

1  fTo 

m  =  lim  —  /  m(t)dt 

To— >oo  T0  J Q 

We  typically  assume  that  the  DC  value  of  the  message  is  zero:  m  —  0. 

A  simple  example,  shown  in  Figure  3.1,  that  we  shall  use  often  is  a  finite-power  sinusoidal 
message  signal,  m(t)  =  Amcos27ifmt,  whose  spectrum  consists  of  impulses  at  ±/m:  M(f)  = 
(S(f  —  fm )  +  5(f  +  fm))-  For  this  message,  m  =  0  and  m 2  =  A2m/ 2. 

Transmitted  Signal:  When  the  signal  transmitted  over  the  channel  is  a  passband  signal,  it 
can  be  written  as  (see  Chapter  2) 

up(t)  =  uc(t )  cos(27r fct)  —  us(t )  sin(27r fct)  =  e(t)  cos(27r fct  +  9(t)) 
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(a)  Sinusoidal  message  waveform  (b)  Sinusoidal  message  spectrum 


Figure  3.1:  Sinusoidal  message  and  its  spectrum 


where  fc  is  a  carrier  frequency,  uc(t)  is  the  I  component,  us(t)  is  the  Q  component,  e(t)  >  0  is  the 
envelope,  and  6{t )  is  the  phase.  Modulation  consist  of  encoding  the  message  in  uc(t)  and  us(t),  or 
equivalently,  in  e(t)  and  9{t).  In  most  of  the  analog  amplitude  modulation  schemes  considered, 
the  message  modulates  the  I  component  (with  the  Q  component  occasionally  playing  a  “sup¬ 
porting  role”)  as  discussed  in  Section  3.2.  The  exception  is  quadrature  amplitude  modulation,  in 
which  both  I  and  Q  components  carry  separate  messages.  In  phase  and  frequency  modulation, 
or  angle  modulation,  the  message  directly  modulates  the  phase  6{t)  or  its  derivative,  keeping  the 
envelope  e(t)  unchanged. 


3.2  Amplitude  Modulation 

We  now  discuss  a  number  of  variants  of  amplitude  modulation,  in  which  the  baseband  message 
signal  modulates  the  amplitude  of  a  sinusoidal  carrier  whose  frequency  falls  in  the  passband  over 
which  we  wish  to  communicate. 


3.2.1  Double  Sideband  (DSB)  Suppressed  Carrier  (SC) 

Here,  the  message  m  modulates  the  I  component  of  the  passband  transmitted  signal  u  as  follows: 

uDSB(t)  =  Am(t)  cos(27t  fct)  (3.1) 

Taking  Fourier  transforms,  we  have 

UdsbU)  =  \  (M(S  -  /.)  +  M(f  +  /„))  (3.2) 

The  time  domain  and  frequency  domain  DSB  signals  for  a  sinusoidal  message  are  shown  in  Figure 
3.2. 

As  another  example,  consider  the  finite-energy  message  whose  spectrum  is  shown  in  Figure  3.3. 
Since  the  time  domain  message  m{t)  is  real- valued,  its  spectrum  exhibits  conjugate  symmetry 
(we  have  chosen  a  complex- valued  message  spectrum  to  emphasize  the  latter  property).  The 
message  bandwidth  is  denoted  by  B.  The  bandwidth  of  the  DSB-SC  signal  is  2 B,  which  is  twice 
the  message  bandwidth.  This  indicates  that  we  are  being  redundant  in  our  use  of  spectrum.  To 
see  this,  consider  the  upper  sideband  (USB)  and  lower  sideband  (LSB)  depicted  in  Figure  3.4. 
The  shape  of  the  signal  in  the  USB  (i.e. ,  Up(f)  for  fc  <  f  <  fc  +  B)  is  the  same  as  that  of  the 
message  for  positive  frequencies  (i.e.,  M(/),/  >  0).  The  shape  of  the  signal  in  the  LSB  (i.e., 
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(a)  DSB  time  domain  waveform  (b)  DSB  spectrum 

Figure  3.2:  DSB-SC  signal  in  the  time  and  frequency  domains  for  the  sinusoidal  message  m(t) 
Am  cos27t  fmt  of  Figure  3.1. 


Re(M(f)) 
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Figure  3.4:  The  spectrum  of  the  passband  DSB-SC  signal  for  the  example  message  in  Figure  3.3. 


Up(f )  for  fc  —  B  <  f  <  fc)  is  the  same  as  that  of  the  message  for  negative  frequencies  (i.e., 
M(/),/  <  0).  Since  m(t)  is  real-valued,  we  have  M(— /)  =  so  that  we  can  reconstruct 

the  message  if  we  know  its  content  at  either  positive  or  negative  frequencies.  Thus,  the  USB  and 
LSB  of  u(t)  each  contain  enough  information  to  reconstruct  the  message.  The  term  DSB  refers  to 
the  fact  that  we  are  sending  both  sidebands.  Doing  this,  of  course,  is  wasteful  of  spectrum.  This 
motivates  single  sideband  (SSB)  and  vestigial  sideband  (VSB)  modulation,  which  are  discussed 
a  little  later. 

The  term  suppressed  carrier  is  employed  because,  for  a  message  with  no  DC  component,  we 
see  from  (3.2)  that  the  transmitted  signal  does  not  have  a  discrete  component  at  the  carrier 
frequency  (i.e.,  Up(f )  does  not  have  impulses  at  ±/c). 


Passband  received 
signal 


2cos  2  7rfc  t 

Figure  3.5:  Coherent  demodulation  for  AM. 
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Demodulation  of  DSB-SC:  Since  the  message  is  contained  in  the  I  component,  demodulation 
consists  of  extracting  the  I  component  of  the  received  signal,  which  we  know  how  to  do  from 
Chapter  2:  multiply  the  received  signal  with  the  cosine  of  the  carrier,  and  pass  it  through  a  low 
pass  filter.  Ignoring  noise,  the  received  signal  is  given  by 

yp[t)  =  Am(t)  cos(27t fct  +  6r)  (3.3) 

where  9r  is  the  phase  of  the  received  carrier  relative  to  the  local  copy  of  the  carrier  produced 
by  the  receiver’s  local  oscillator  (LO),  and  A  is  the  received  amplitude,  taking  into  account  the 
propagation  channel  from  the  transmitter  to  the  receiver.  The  demodulator  is  shown  in  Figure 
3.5.  In  order  for  this  demodulator  to  work  well,  we  must  have  6r  as  close  to  zero  as  possible; 
that  is,  the  carrier  produced  by  the  LO  must  be  coherent  with  the  received  carrier.  To  see  the 
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effect  of  phase  mismatch,  let  us  compute  the  demodulator  output  for  arbitrary  9r.  Using  the 
trigonometric  identity  2cos6*i  cos  92  =  008(6*1  —  9  2)  +  cos(6*i  +  6*2),  we  have 

2 yp(t)  cos(27 \fct)  =  Am(t )  cos(27r fct  +  6r)  cos(27r/ct)  =  Am(t )  cos6*r  +  Am(t )  cos(47r/ct  +  6*r) 

We  recognize  the  second  term  on  the  extreme  right-hand  side  as  being  a  passband  signal  at  2 fc 
(since  it  is  a  baseband  message  multiplied  by  a  carrier  whose  frequency  exceeds  the  message 
bandwidth).  It  is  therefore  rejected  by  the  lowpass  filter.  The  first  term  is  a  baseband  signal 
proportional  to  the  message,  which  appears  unchanged  at  the  output  of  the  LPF  (except  possibly 
for  scaling),  as  long  as  the  LPF  response  has  been  designed  to  be  flat  over  the  message  bandwidth. 
The  output  of  the  demodulator  is  therefore  given  by 

rh(t)  —  Am(t)  cos  9r  (3.4) 

We  can  also  infer  this  using  the  complex  baseband  representation,  which  is  what  we  prefer  to 
employ  instead  of  unwieldy  trigonometric  identities.  The  coherent  demodulator  in  Figure  3.5 
extracts  the  I  component  relative  to  the  receiver’s  LO.  The  received  signal  can  be  written  as 

yp(t)  =  Am(t )  cos(27r/cf  +  9r)  =  Re  ( Am(t )eJ(27r^ct+l9r))  =  Re  (Am(t)ej6r ej2n^ct) 

from  which  we  can  read  off  the  complex  envelope  y(t)  =  Am{t)e^6r .  The  real  part  yc{t)  = 
Am(t )  cos  9r  is  the  I  component  extracted  by  the  demodulator. 

The  demodulator  output  (3.4)  is  proportional  to  the  message,  which  is  what  we  want,  but 
the  proportionality  constant  varies  with  the  phase  of  the  received  carrier  relative  to  the  LO. 
In  particular,  the  signal  gets  significantly  attenuated  as  the  phase  mismatch  increases,  and  gets 
completely  wiped  out  for  9r  =  Note  that,  if  the  carrier  frequency  of  the  LO  is  not  synchronized 
with  that  of  the  received  carrier  (say  with  frequency  offset  A/),  then  9r(t)  =  27rA/t  +  0  is  a  time- 
varying  phase  that  takes  all  values  in  [0,  27t),  which  leads  to  time- varying  signal  degradation  in 
amplitude,  as  well  as  unwanted  sign  changes.  Thus,  for  coherent  demodulation  to  be  successful, 
we  must  drive  A /  to  zero,  and  make  0  as  small  as  possible;  that  is,  we  must  synchronize  to 
the  received  carrier.  One  possible  approach  to  use  feedback-based  techniques  such  as  the  phase 
locked  loop,  discussed  later  in  this  chapter. 


3.2.2  Conventional  AM 

In  conventional  AM,  we  add  a  large  carrier  component  to  a  DSB-SC  signal,  so  that  the  passband 
transmitted  signal  is  of  the  form: 

UAM^t)  =  Am(t )  cos(27r/ct)  +  Ac  cos(27r/ct)  (3.5) 

Taking  the  Fourier  transform,  we  have 

UamU)  =  J  (M{i  -  fc)  +  MU  +  fc))  +  Y  (S (/  -  fc)  +  Hf  +  fc)) 

which  means  that,  in  addition  to  the  USB  and  LSB  due  to  the  message  modulation,  we  also  have 
impulses  at  ±fc  due  to  the  unmodulated  carrier.  Figure  3.6  shows  the  resulting  spectrum. 

The  key  concept  behind  conventional  AM  is  that,  by  making  Ac  large  enough,  the  message  can  be 
demodulated  using  a  simple  envelope  detector.  Large  Ac  corresponds  to  expending  transmitter 
power  on  sending  an  unmodulated  carrier  which  carries  no  message  information,  in  order  to 
simplify  the  receiver.  This  tradeoff  makes  sense  in  a  broadcast  context,  where  one  powerful 
transmitter  may  be  sending  information  to  a  large  number  of  low-cost  receivers,  and  is  the 
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Figure  3.6:  The  spectrum  of  a  conventional  AM  signal  for  the  example  message  in  Figure  3.3. 


design  approach  that  has  been  adopted  for  broadcast  AM  radio.  A  more  detailed  discussion 

follows. 

The  envelope  of  the  AM  signal  in  (3.5)  is  given  by 

e(t)  =  |  Am{t)  +  Ac\ 

If  the  term  inside  the  magnitude  operation  is  always  nonnegative,  we  have  e(t)  =  Am(t)  +  Ac. 
In  this  case,  we  can  read  off  the  message  signal  directly  from  the  envelope,  using  AC  coupling  to 
get  rid  of  the  DC  offset  due  to  the  second  term.  For  this  to  happen,  we  must  have 


A  m(t )  +  Ac  >  0  for  all  t  •<=>■  A  +  Ac  >  0  (3.6) 

Let  mil =  —  M0,  where  M0  =  |mintm(t)|.  (Note  that  the  minimum  value  of  the  message 
must  be  negative  if  the  message  has  zero  DC  value.)  Equation  (3.6)  reduces  to  —AM0  +  Ac  >  0, 
or  Ac  >  AM0.  Let  us  define  the  modulation  index  amod  as  the  ratio  of  the  size  of  the  biggest 
negative  incursion  due  to  the  message  term  to  the  size  of  the  unmodulated  carrier  term: 

AM0  A|miipm(f)| 

amod  =  —7—  =  n 

Sl-C  Slc 


The  condition  (3.6)  for  accurately  recovering  the  message  using  envelope  detection  can  now  be 
rewritten  as 


amod  —  1 

It  is  also  convenient  to  define  a  normalized  version  of  the  message  as  follows: 

m(t )  m(t) 


mn(t ) 


\  /  inin(j7i(t) 


(3.7) 


(3,8) 


which  satisfies 

min  tm(t) 

mmtmn(t)  = - — — —  =  —  1 

v  ’  M0 

It  is  easy  to  see  that  the  AM  signal  (3.5)  can  be  rewritten  as 

UAM(t)  =  Ac  (1  +  amodmn(t ))  cos(27r/cf)  (3.9) 

which  clearly  brings  out  the  role  of  modulation  index  in  ensuring  that  envelope  detection  works. 
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(a)  Modulation  Index  amod  =  0.5 


(b)  Modulation  Index  amod  =  1.0 


(c)  Modulation  Index  amod  =  1.5 


Figure  3.7:  Time  domain  AM  waveforms  for  a  sinusoidal  message.  The  envelope  no  longer  follows 
the  message  for  modulation  index  larger  than  one. 
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Figure  3.8:  Envelope  detector  demodulation  of  AM.  The  envelope  detector  output  is  typically 
passed  through  a  DC  blocking  capacitance  (not  shown)  to  eliminate  the  DC  offset  due  to  the 
carrier  component  of  the  AM  signal. 


Figure  3.9:  The  relation  between  the  envelope  detector  output  vout(t)  (shown  in  bold)  and  input 
Vin(t)  (shown  as  dashed  line).  The  output  closely  follows  the  envelope  (shown  as  dotted  line). 
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Figure  3.7  illustrates  the  impact  of  modulation  index  on  the  viability  of  envelope  detection,  where 
the  message  signal  is  the  sinusoidal  message  in  Figure  3.1.  For  amod  =  0.5  and  amod  =  1,  we  see 
that  envelope  equals  a  scaled  and  DC-shifted  version  of  the  message.  For  amod  =  1.5,  we  see  that 
the  envelope  no  longer  follows  the  shape  of  the  message. 

Demodulation  of  Conventional  AM:  Ignoring  noise,  the  received  signal  is  given  by 

yP(t)  =  B  (1  +  amodmn(t))  cos(27r/ct  +  6r )  (3.10) 

where  9r  is  a  phase  offset  which  is  unknown  a  priori,  if  we  do  not  perform  carrier  synchronization. 
However,  as  long  as  amod  <  1,  we  can  recover  the  message  without  knowing  6r  using  envelope 
detection,  since  the  envelope  is  still  just  a  scaled  and  DC-shifted  version  of  the  message.  Of 
course,  the  message  can  also  be  recovered  by  coherent  detection,  since  the  I  component  of  the 
received  carrier  equals  a  scaled  and  DC-shifted  version  of  the  message.  However,  by  doing  enve¬ 
lope  detection  instead,  we  can  avoid  carrier  synchronization,  thus  reducing  receiver  complexity 
drastically.  An  envelope  detector  is  shown  in  Figure  3.8,  and  an  example  (where  the  envelope 
is  a  straight  line)  showing  how  it  works  is  depicted  in  Figure  3.9.  The  diode  (we  assume  that  it 
is  ideal)  conducts  in  only  the  forward  direction,  when  the  input  voltage  Vin(t)  of  the  passband 
signal  is  larger  than  the  output  voltage  vout(t)  across  the  RC  filter.  When  this  happens,  the 
output  voltage  becomes  equal  to  the  input  voltage  instantaneously  (under  the  idealization  that 
the  diode  has  zero  resistance).  In  this  regime,  we  have  vout (t)  =  Vin(t).  When  the  input  voltage  is 
smaller  than  the  output  voltage,  the  diode  does  not  conduct,  and  the  capacitor  starts  discharging 
through  the  resistor  with  time  constant  RC.  As  shown  in  Figure  3.9,  in  this  regime,  starting  at 
time  ti,  we  have  v(t)  =  u1e-h-ti)/^C'^  w}iere  Vl  —  v(ti),  as  shown  in  Figure  3.9. 

Roughly  speaking,  the  capacitor  gets  charged  at  each  carrier  peak,  and  discharges  between  peaks. 
The  time  interval  between  successive  charging  episodes  is  therefore  approximately  equal  to  4-, 

Jc 

the  time  between  successive  carrier  peaks.  The  factor  by  which  the  output  voltage  is  reduced 
during  this  period  due  to  capacitor  discharge  is  exp  (— 1  /(fcRC)).  This  must  be  close  to  one  in 
order  for  the  voltage  to  follow  the  envelope,  rather  than  the  variations  in  the  sinusoidal  carrier. 
That  is,  we  must  have  fcRC  1.  On  the  other  hand,  the  decay  in  the  envelope  detector  output 
must  be  fast  enough  (i.e.,  the  RC  time  constant  must  be  small  enough)  so  that  it  can  follow 
changes  in  the  envelope.  Since  the  time  constant  for  envelope  variations  is  inversely  proportional 
to  the  message  bandwidth  B ,  we  must  have  RC  <C  1/B.  Combining  these  two  conditions  for 
envelope  detection  to  work  well,  we  have 

(3.11) 

Jc  & 

This  of  course  requires  that  /c  B  (carrier  frequency  much  larger  than  message  bandwidth), 
which  is  typically  satisfied  in  practice.  For  example,  the  carrier  frequencies  in  broadcast  AM 
radio  are  over  500  KHz,  whereas  the  message  bandwidth  is  limited  to  5  KHz.  Applying  (3.11), 
the  RC  time  constant  for  an  envelope  detector  should  be  chosen  so  that 

2/isC  RC  <C  200  /zs 

In  this  case,  a  good  choice  of  parameters  would  be  RC  =  20/zs,  for  example,  with  R  =  50  ohms, 
and  C  —  400  nanofarads. 

Software  Lab  3.1  introduces  a  different  application  of  envelope  detection.  Adding  a  strong  carrier 
component  at  the  receiver,  followed  by  envelope  detection,  provides  an  alternative  approach  to 
downconversion  that  avoids  the  use  of  mixers,  which  are  difficult,  to  implement  at  very  high 
carrier  frequencies  (e.g.,  for  coherent  optical  communication). 

Power  efficiency  of  conventional  AM:  The  price  we  pay  for  the  receiver  simplicity  of  conven¬ 
tional  AM  is  power  inefficiency:  in  (3.5)  the  unmodulated  carrier  Accos(2nfct)  is  not  carrying 
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any  information  regarding  the  message.  We  now  compute  the  power  efficiency  tjami  which  is 
defined  as  the  ratio  of  the  transmitted  power  due  to  the  message-bearing  term  Am(t )  cos(27r/ct) 
to  the  total  power  of  Uam (t)-  In  order  to  express  the  result  in  terms  of  the  modulation  index, 
let  us  use  the  expression  (3.9). 

_  _  A2 -  a 2 - 

u2AM(t)  =  A2C  (1  +  amodmn(t ))2  cos2(2vr fct)  =  -y  (1  +  amodmn(t))2+^-(  1  +  amodmn(t ))2  cos(4vr fct) 

The  second  term  on  the  right-hand  side  is  the  DC  value  of  a  passband  signal  at  2 fc,  which  is 
zero.  Expanding  out  the  first  term,  we  have 

_  /  _  \  /  _ \ 

K am (t)  =  y(1  +  +  2,amodm^J  =  Y  (l  +  a2modrn2n)  (3.12) 

assuming  that  the  message  has  zero  DC  value.  The  power  of  the  message-bearing  term  can  be 
similarly  computed  as 

-  A?  _ 

(Acamodmn(t))2  cos2 fct)  = 
so  that  the  power  efficiency  is  given  by 


Vam 


hmod 


mi 


1  + 


a2rnodm2n 


(3.13) 


Noting  that  mn  is  normalized  so  that  its  most  negative  value  is  —1,  for  messages  which  have 
comparable  positive  and  negative  excursions  around  zero,  we  expect  \mn(t)\  <  1,  and  hence 
average  power  m2  <  1  (typical  values  are  much  smaller  than  one).  Since  amod  <  1  for  envelope 
detection  to  work,  the  power  efficiency  of  conventional  AM  is  at  best  50%.  For  a  sinusoidal 
message,  for  example,  it  is  easy  to  see  that  m2  =  1/2,  so  that  the  power  efficiency  is  at  most  33%. 
For  speech  signals,  which  have  significantly  higher  peak-to-average  ratio,  the  power  efficiency  is 
even  smaller. 


Example  3.2.1  (AM  power  efficiency  computation):  The  message  m(t)  =  2sin20007ri  — 
3cos40007rt  is  used  in  an  AM  system  with  a  modulation  index  of  70%  and  carrier  frequency 
of  580  KHz.  What  is  the  power  efficiency?  If  the  net  transmitted  power  is  10  watts,  find  the 
magnitude  spectrum  of  the  transmitted  signal. 

We  need  to  fold  Mq  =  |mintm(f)|  in  order  to  determine  the  normalized  form  mn(t )  =  m(t)/Mo. 
To  simplify  notation,  let  x  =  20007rf,  and  minimize  g(x)  =  2  sin  a:  —  3cos2a:.  Since  g  is  periodic 
with  period  2i r,  we  can  minimize  it  numerically  over  a  period.  However,  we  can  perform  the 
minimization  analytically  in  this  case.  Differentiating  g,  we  obtain 

g'{x)  =  2  cos  x  +  6  sin  2a:  =  0 


This  gives 

2  cos  x  +  12  sin  x  cos  x  =  2  cos  x(l  +  6  sin  x)  —  0 

There  are  two  solutions  cos  x  =  0  and  sin  x  —  —  |.  The  Erst  solution  gives  cos  2a:  =  2  cos2  x  —  1  — 
— 1  and  sin  a:  =  ±1,  which  gives  g(x)  =  1,5.  The  second  solution  gives  cos  2a:  =  1  —  2  sin2  a:  = 
1  —  2/36  =  17/18,  which  gives  g(x)  =  2(— 1/6)  —  3(17/18)  =  —19/6.  We  therefore  obtain 

M0  =  |mintm(f)|  =  19/6 


This  gives 


m(+\  12  18 

mn(t )  =  .  r  =  —  sin  10007rf - cos  20007rf 

v  ’  M0  19  19 
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This  gives 


m2n  =  (12/19)2(l/2)  +  (18/19)2(l/2)  =  0.65 

Substituting  in  (3.13),  setting  amod  =  0.7,  we  obtain  a  power  efficiency  tjam  =  0.24,  or  24%. 

To  figure  out  the  spectrum  of  the  transmitted  signal,  we  must  find  Ac  in  the  formula  (3.9).  The 
power  of  the  transmitted  signal  is  given  by  (3.12)  to  be 

10  =  §  i1  +  <<*<)  =  t(1  +  (°-72)(°-65)) 

which  yields  Ar  ~  3.9.  The  overall  AM  signal  is  given  by 

UAAf(t)  =  Ac(  1  +  amodmn{t ))  cos27r/ct  =  Ac  (1  +  cp  sin  27 rf\t  +  a2  cos47t fit)  cos27r/cf 

where  Gp  =  0.7(12/19)  =  0.44,  a2  =  0.7 ( — 18 / 19)  =  —0.66,  f\  —  1  KHz  and  fc  =  580 KHz.  The 
magnitude  spectrum  is  given  by 

\UamU)\  =  AJ2  (6(f  -  fc)  +  6(f  +  fc)) 

+  Ac M/4  (<S(/  -fc-  +  5(f  -fc  +  h)  +  S(f  +  fc  +  h)  +  5(f  +  fe-  h)) 

+  Ac|a2|/4  (5(f  -fc-  2/0  +  5(f  -  fc  +  2/0  +  S(f  +  fc  +  2/0  +  5(f  +  fc-  2/0) 

with  numerical  values  shown  in  Figure  3.10. 


Figure  3.10:  Magnitude  spectrum  for  the  AM  waveform  in  Example  3.2.1. 


3.2.3  Single  Sideband  Modulation  (SSB) 

In  SSB  modulation,  we  send  either  the  upper  sideband  or  the  lower  sideband  of  a  DSB-SC  signal. 
For  the  running  example,  the  spectra  of  the  passband  USB  and  LSB  signals  are  shown  in  Figure 
3.11. 

From  our  discussion  of  DSB-SC,  we  know  that  each  sideband  provides  enough  information  to 
reconstruct  the  message.  But  how  do  we  physically  reconstruct  the  message  from  an  SSB  signal? 
To  see  this,  consider  the  USB  signal  depicted  in  Figure  3.11(a).  We  can  reconstruct  the  baseband 
message  if  we  can  move  the  component  near  +/c  to  the  left  by  /c,  and  the  component  near  —fc 
to  the  right  by  /c;  that  is,  if  we  move  in  the  passband  components  towards  the  origin.  These 
two  frequency  translations  can  be  accomplished  by  multiplying  the  USB  signal  by  2cos27t fct  = 
ej2nfct  _|_  e-j27r/c^  ag  s}lown  jn  Figure  3.5,  which  creates  the  desired  message  signal  at  baseband, 
as  well  as  undesired  frequency  components  at  ±2 fc  which  can  be  rejected  by  a  lowpass  filter.  It 
can  be  checked  that  the  same  argument  applies  to  LSB  signals  as  well. 

It  follows  from  the  preceding  discussion  that  SSB  signals  can  be  demodulated  in  exactly  the 
same  fashion  as  DSB-SC,  using  the  coherent  demodulator  depicted  in  Figure  3.5.  Since  this 
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(b)  Lower  Sideband  Signaling 

Figure  3.11:  Spectra  for  SSB  signaling  for  the  example  message  in  Figure  3.3. 
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demodulator  simply  extracts  the  I  component  of  the  passband  signal,  the  I  component  of  the 
SSB  signal  must  be  the  message.  In  order  to  understand  the  structure  of  an  SSB  signal,  it  remains 
to  identify  the  Q  component.  This  is  most  easily  done  by  considering  the  complex  envelope  of 
the  passband  transmitted  signal.  Consider  again  the  example  USB  signal  in  Figure  3.11(a).  The 
spectrum  U(f)  of  its  complex  envelope  relative  to  fc  is  shown  in  Figure  3.12.  Now,  the  spectra 
of  I  and  Q  components  can  be  inferred  as  follows: 


Applying  these  equations,  we  get  I  and  Q  components  as  shown  in  Figure  3.13. 


Figure  3.12:  Complex  envelope  for  the  USB  signal  in  Figure  3.11(a). 


Q  component 


I  component 


Re(Uc  (£)) 


Re.ru.  mi 


Im(Uc  (f)) 


Figure  3.13:  I  and  Q  components  for  the  USB  signal  in  Figure  3.11(a). 


Thus,  up  to  scaling,  the  I  component  Uc(f)  =  M(/),  and  the  Q  component  is  a  transformation 
of  the  message  given  by 


(3.14) 
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That  is,  the  Q  component  is  a  filtered  version  of  the  message,  where  the  filter  transfer  function 
is  H(f)  =  —  jsgn(f).  This  transformation  is  given  a  special  name,  the  Hilbert  transform. 

Hilbert  transform:  The  Hilbert  transform  of  a  signal  x{t)  is  denoted  by  x(t),  and  is  specified 
in  the  frequency  domain  as 

X(/)  =  (-jsgn(/))X(/) 

This  corresponds  to  passing  u  through  a  filter  with  transfer  function 

H{f)  =  —jsgn(f)  O  h(t)  =  — 

Tit 

where  the  derivation  of  the  impulse  response  is  left  as  an  exercise. 

Re(M(f)) 


Figure  3.14:  Spectrum  of  the  Hilbert  transform  of  the  example  message  in  Figure  3.3. 

Figure  3.14  shows  the  spectrum  of  the  Hilbert  transform  of  the  example  message  in  Figure  3.3. 
We  see  that  it  is  the  same  (upto  scaling)  as  the  Q  component  of  the  USB  signal,  shown  in  Figure 
3.13. 

Physical  interpretation  of  the  Hilbert  transform:  If  x(t)  is  real-valued,  then  so  is  its 
Hilbert  transform  x(t).  Thus,  the  Fourier  transforms  X(f)  and  X(f)  must  both  satisfy  conjugate 
symmetry,  and  we  only  need  to  discuss  what  happens  at  positive  frequencies.  For  /  >  0,  we  have 
X(f)  =  —jsga(f)X(f)  =  — jX(f )  =  e~jn/2X(f).  That  is,  the  Hilbert  transform  simply  imposes 
a  7t/2  phase  lag  at  all  (positive)  frequencies,  leaving  the  magnitude  of  the  Fourier  transform 
unchanged. 

Example  3.2.2  (Hilbert  transform  of  a  sinusoid):  Based  on  the  preceding  argument,  a 
sinusoid  s(t)  =  cos(27r f0t  +  <f)  has  Hilbert  transform  s(t)  =  cos(27r/0t  +  f  —  |)  =  sin(27r/0t  +  <f). 
We  can  also  do  this  the  hard  way,  as  follows: 

s(t)  =  COs(27r/0t  +  <f>)  =  |  ( ej(27r/°t+(/> )  e-j{2irf0t+(j> 

++  S(l)  =  |  (e‘*SU  -  So)  +  e~‘*6 (/  +  /»)) 

Thus, 

S(f)  =  -jsgn (f)S(f)  =  1  (e^(-j)S(f  -  /„)  +  +  So)) 

++  s(t)  =  i  (e^(-j)e'2,r/»‘  + 

which  simplifies  to 

s(t)  =  —  ( ej(27rfot+ #  -  e~j{2nfot+rt>))  =  sin(27 if0t  +  </>) 
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Equation  (3.14)  shows  that  the  Q  component  of  the  USB  signal  is  m(t),  the  Hilbert  transform 
of  the  message.  Thus,  the  passband  USB  signal  can  be  written  as 

uusB(t )  =  m(t)  cos(27r fct)  —  m(t)  sin(27r fct)  (3.15) 

Similarly,  we  can  show  that  the  Q  component  of  an  LSB  signal  is  —fh(t),  so  that  the  passband 
LSB  signal  is  given  by 


ULSBit)  =  m(t)  cos(27T  fct)  +  m(t )  sm(2n fct) 


(3.16) 


USB/LSB  signal 


Figure  3.15:  SSB  modulation  using  the  Hilbert  transform  of  the  message. 


SSB  modulation:  Conceptually,  an  SSB  signal  can  be  generated  by  filtering  out  one  of  the 
sidebands  of  a  DSB-SC  signal.  However,  it  is  difficult  to  implement  the  required  sharp  cutoff 
at  fc,  especially  if  we  wish  to  preserve  the  information  contained  at  the  boundary  of  the  two 
sidebands,  which  corresponds  to  the  message  information  near  DC.  Thus,  an  implementation  of 
SSB  based  on  sharp  bandpass  Elters  runs  into  trouble  when  the  message  has  significant  frequency 
content  near  DC.  The  representations  in  (3.15)  and  (3.16)  provide  an  alternative  approach  to 
generating  SSB  signals,  as  shown  in  Figure  3.15.  We  have  emphasized  the  role  of  90°  phase  lags 
in  generating  the  1  and  Q  components,  as  well  as  the  LO  signals  used  for  upconversion. 
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Figure  3.16:  DSB  and  SSB  spectra  for  a  sinusoidal  message. 
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Example  3.2.3  (SSB  waveforms  for  a  sinusoidal  message):  For  a  sinusoidal  message 
m(t)  =  cos27t fmt,  we  have  fh(t)  =  sin  27 rfmt  from  Example  3.2.2.  Consider  the  DSB  signal 

uDSB(t )  =  2  cos  27r/mt  cos  27t  fct 


where  we  have  normalized  the  signal  power  to  one:  u2DSB  =  1.  The  DSB,  USB  and  SSB  spectrum 
are  shown  in  Figure  3.16.  From  the  SSB  spectra  shown,  we  can  immediately  write  down  the 
following  time  domain  expressions: 

uusB{t)  =  cos  27 r(/c  +  frn)t  =  cos  2n fmt  cos  2nfct  -  sin  27rfmt  sin  2nfct 

ULSEs(t )  =  cos  27 r(/c  -  fm)t  =  cos  cos  2nfct  +  sin  2nfmt  sin  2nfct 

The  preceding  equations  are  consistent  with  (3.15)  and  (3.16).  For  both  the  USB  and  LSB 
signals,  the  I  component  equals  the  message:  uc(t)  =  m(t)  =  cos27r fmt.  The  Q  component 
for  the  USB  signal  is  us(t)  =  rh(t)  =  sin  27r/mt,  and  the  Q  component  for  the  LSB  signal  is 
us(t)  =  —fh(t)  =  —  sin27r  fmt. 


SSB  demodulation:  We  know  now  that  the  message  can  be  recovered  from  an  SSB  signal 
by  extracting  its  I  component  using  a  coherent  demodulator  as  in  Figure  3.5.  The  difficulty  of 
coherent  demodulation  lies  in  the  requirement  for  carrier  synchronization,  and  we  have  discussed 
the  adverse  impact  of  imperfect  synchronization  for  DSB-SC  signals.  We  now  show  that  the 
performance  degradation  is  even  more  significant  for  SSB  signals.  Consider  a  USB  received 
signal  of  the  form  (ignoring  scale  factors): 

yp(t)  =  m(t)  cos(27t fct  +  9r)  —  m(t )  sin(27r fct  +  9r)  (3-17) 

where  9r  is  the  phase  offset  with  respect  to  the  receiver  LO.  The  complex  envelope  with  respect 
to  the  receiver  LO  is  given  by 

y(t)  =  (m(t)  +  jrh(t))  ej9r  =  ( m(t )  +  jrh(t))  (cos  9r  +jsin9r) 

Taking  the  real  part,  we  obtain  that  the  I  component  extracted  by  the  coherent  demodulator  is 

yc{t)  =  m(t)  cos  9r  —  rh(t)  sin#r 

Thus,  as  the  phase  error  9r  increases,  not  only  do  we  get  an  attenuation  in  the  first  term  corre¬ 
sponding  to  the  desired  message  (as  in  DSB),  but  we  also  get  interference  due  to  the  second  term 
from  the  Hilbert  transform  of  the  message.  Thus,  for  coherent  demodulation,  accurate  carrier 
synchronization  is  even  more  crucial  for  SSB  than  for  DSB. 

Noncoherent  demodulation  is  also  possible  for  SSB  if  we  add  a  strong  carrier  term,  as  in  conven¬ 
tional  AM.  Specifically,  for  a  received  signal  given  by 

yp(t)  =  {A  +  m{t))  cos(27t fct  +  9r)  ±  m(t)  sin(27r fct  +  9r) 

the  envelope  is  given  by 


e{t)  =  \/(A  +  m(t ))2  +  fn2{t)  ca  A  +  m{t)  (3.18) 

if  | A  +  m(t) |  \m(t)\.  Subject  to  the  approximation  in  (3.18),  an  envelope  detector  works  just 

as  in  conventional  AM. 
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Hp  (f— fc  )+Hp  (f+  fc)  constant  over  message  band 


Figure  3.17:  Relevant  passband  and  baseband  spectra  for  VSB. 


3.2.4  Vestigial  Sideband  (VSB)  Modulation 

VSB  is  similar  to  SSB,  in  that  it  also  tries  to  reduce  the  transmitted  bandwidth  relative  to  DSB, 
and  the  transmitted  signal  is  a  filtered  version  of  the  DSB  signal.  The  idea  is  to  mainly  transmit 
one  of  the  two  sidebands,  but  to  leave  a  vestige  of  the  other  sideband  in  order  to  ease  the  filtering 
requirements.  The  passband  filter  used  to  shape  the  DSB  signal  in  this  fashion  is  chosen  so  that 
the  I  component  of  the  transmitted  signal  equals  the  message.  To  see  this,  consider  the  DSB-SC 
signal 

2 m(t)  cos2tt fct  <H-  M(f  -  fc)  +  M(f  +  fc) 

This  is  filtered  by  a  passband  VSB  filter  with  transfer  function  Hp(f),  as  shown  in  Figure  3.17, 
to  obtain  the  transmitted  signal  with  spectrum 

Uvssif)  =  Hp(f)  (M(f  -  fc)  +  M(f  +  fc))  (3.19) 

A  coherent  demodulator  extracting  the  1  component  passes  2 UysBif)  cos27t fct  through  a  lowpass 
filter.  But 

2 UvSB(t)  COS27T fct  -H-  UvSbU  —  fc)  +  UySB^f  +  fc) 

which  equals  (substituting  from  (3.19), 

Hp(f  -  fc)  (M(f  -  2/c)  +  M(/))  +  Hp(f  +  fc)  (. M(f )  +  M(f  +  2 fc))  (3.20) 

The  2/c  term,  Hp(f  —  fc)M(f  —  2 fc)  +  Hp(f  +  fc)M(f  +  2/c),  is  filtered  out  by  the  lowpass  filter. 
The  output  of  the  LPF  are  the  lowpass  terms  in  (3.20),  which  equal  the  1  component,  and  are 
given  by 

M{f)  (H?(f  ~  fc)  +Hp(f  +  fc)) 
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In  order  for  this  to  equal  (a  scaled  version  of)  the  desired  message,  we  must  have 

Hp{f  +  fc)  +  Hp(f  -  fc)  =  constant  ,  \f\<W  (3.21) 

as  shown  in  the  example  in  Figure  3.17.  To  understand  what  this  implies  about  the  structure 
of  the  passband  VSB  filter,  note  that  the  filter  impulse  response  can  be  written  as  hp(t)  = 
hc{t)  cos27r/ct  —  hs(t)  sm2n  fct,  where  hc(t)  is  obtained  by  passing  2hp{t)  cos(2tt fct)  through  a 
lowpass  filter.  But  2 hp(t)  cos(27t  fct)  -B-  Hp(f  —  fc)  +  Hp(f  +  fc).  Thus,  the  Fourier  transform 
involved  in  (3.21)  is  precisely  the  lowpass  restriction  of  2 hp{t)  cos(27t fct),  i.e.,  it  is  Hc(f).  Thus, 
the  correct  demodulation  condition  for  VSB  in  (3.21)  is  equivalent  to  requiring  that  Hc(f )  be 
constant  over  the  message  band.  Further  discussion  of  the  structure  of  VSB  signals  is  provided 
via  problems. 

As  with  SSB,  if  we  add  a  strong  carrier  component  to  the  VSB  signal,  we  can  demodulate  it 
noncoherently  using  an  envelope  detector,  again  at  the  cost  of  some  distortion  from  the  presence 
of  the  Q  component. 


3.2.5  Quadrature  Amplitude  Modulation 

The  transmitted  signal  in  quadrature  amplitude  modulation  (QAM)  is  of  the  form 

uQAM{t )  =  mc(t)  cos 27 rfct  -  ms(t)  sm2n  fct 

where  mc(t)  and  ms(t )  are  separate  messages  (unlike  SSB  and  VSB,  where  the  Q  component  is 
a  transformation  of  the  message  carried  by  the  I  component).  In  other  words,  a  complex- valued 
message  m  =  mc(t)  +  jms(t)  is  encoded  in  the  complex  envelope  of  the  passband  transmitted 
signal.  QAM  is  extensively  employed  in  digital  communication,  as  we  shall  see  in  later  chapters. 
It  is  also  used  to  carry  color  information  in  analog  TV. 
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Figure  3.18:  Demodulation  for  quadrature  amplitude  modulation. 


Demodulation  is  achieved  using  a  coherent  receiver  which  extracts  both  the  I  and  Q  components, 
as  shown  in  Figure  3.18.  If  the  received  signal  has  a  phase  offset  6  relative  to  the  receiver’s  LO, 
then  we  get  both  attenuation  in  the  desired  message  and  interference  from  the  undesired  message, 
as  follows.  Ignoring  noise  and  scale  factors,  the  reconstructed  complex  baseband  message  is  given 
by 

m(t)  =  rhc(t )  +  jrhs(t)  =  ( mc(t )  +  jms(t))e?0®  =  m{t)e^e^ 
from  which  we  conclude  that 

mc(t )  =  mc{t)  cos  9{t)  —  ms(t)  sin  6{t) 
ms(t )  =  ms(t )  cos  9{t)  +  mc(t)  sin  6{t) 

Thus,  accurate  carrier  synchronization  (6(t)  as  close  to  zero  as  possible)  is  important  for  QAM 
demodulation  to  function  properly. 
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3.2.6  Concept  synthesis  for  AM 

Here  is  a  worked  problem  that  synthesizes  a  few  of  the  concepts  we  have  discussed  for  AM. 
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Figure  3.19:  Spectrum  of  message  and  the  corresponding  AM  signal  in  Example  3.2.4.  Axes  are 
not  to  scale. 
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Figure  3.20:  Passband  output  of  bandpass  filter  and  its  complex  envelope  with  respect  to  600 
KHz  reference,  for  Example  3.2.4.  Axes  are  not  to  scale. 


Example  3.2.4  The  signal  m(t)  =  2cos207rt  —  cos407rt,  where  the  unit  of  time  is  millisec¬ 
onds,  is  amplitude  modulated  using  a  carrier  frequency  fc  of  600  KHz.  The  AM  signal  is  given 
by 

x(t)  =  5  cos  2iTfct  +  m(t )  cos  2irfct 

(a)  Sketch  the  magnitude  spectrum  of  x.  What  is  its  bandwidth? 

(b)  What  is  the  modulation  index? 

(c)  The  AM  signal  is  passed  through  an  ideal  highpass  filter  with  cutoff  frequency  595  KHz  (i.e. , 
the  filter  passes  all  frequencies  above  595  KHz,  and  cuts  off  all  frequencies  below  595  KHz).  Find 
an  explicit  time  domain  expression  for  the  Q  component  of  the  filter  output  with  respect  to  a 
600  KHz  frequency  reference. 

Solution:  (a)  The  message  spectrum  M(f )  =  5(f  —  10)  +  5(f  +  10)  —  |5(/  —  20)  —  |<5(/  +  20). 
The  spectrum  of  the  AM  signal  is  given  by 

X(f)  =  |<5(/  -  fc)  +  |<5(/  +  fc)  +  \m({  -  fc)  +  1  M(f  +  fc) 

These  spectra  are  sketched  in  Figure  3.19. 

(b)  The  modulation  index  amod  =  M0/Ac ,  where  — M0  =  mmtm(t).  To  simplify  notation,  let  us 
minimize  g(x)  =  2cosx  —  cos2x.  We  can  actually  do  this  by  inspection:  for  x  =  n,  cosx  =  — 1 
and  cos  2x  =  1,  so  that  minxg(x)  =  —3.  Alternatively,  we  could  set  the  derivative  to  zero: 
g'(x )  =  —2  sin  x  +  2  sin  2x  =  —2  sin  x  +  4  sin  x  cos  x  =  2  sin  x{— 1  +  2  cos  x)  is  satisfied  if  sin  x  =  0 
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(i.e.,  cosx  =  ±1)  or  cosx  =  We  can  check  that  the  first  solution  with  cosx  =  —1  minimizes 
g(x).  Thus,  we  obtain  M0  =  3  and  hence  amod  =  M0/Ac  =  3/5  or  60%. 

(c)  From  Figure  3.19,  it  is  clear  that  a  highpass  filter  with  cutoff  at  595  KHz  selects  the  USB 
signal  plus  the  carrier.  The  passband  output  has  spectrum  as  shown  in  Figure  3.20(a),  and  the 
complex  envelope  with  respect  to  600  KHz  is  shown  in  Figure  3.20(b).  Taking  the  inverse  Fourier 
transform,  the  time  domain  complex  envelope  is  given  by 

y(t)  =  5  +  ej207rf  -  ^ej40nt 

We  can  now  find  the  Q  component  to  be 


ys{t)  =  Im(j7(t)) 


sin  207 rt - sin  407rt 

2 


where  t  is  in  milliseconds.  Another  approach  is  to  recognize  that  the  Q  component  is  the  Q 
component  of  the  USB  signal,  which  is  known  to  be  the  Hilbert  transform  of  the  message. 
Yet  another  approach  is  to  find  the  Q  component  in  the  frequency  domain  using  jYs(f)  = 


(Y(f)-Y*(f))/  2 


and  then  take  inverse  Fourier  transform. 


In  this  particular  example,  the  first 


approach  is  probably  the  simplest. 


3.3  Angle  Modulation 


We  know  that  a  passband  signal  can  be  represented  as  e(t)  cos(2tt fct  +  0(f)),  where  e(t)  is  the 
envelope,  and  9{t)  is  the  phase.  Let  us  define  the  instantaneous  frequency  offset  relative  to  the 
carrier  as 

^  = 1~, f 

In  frequency  modulation  (FM)  and  phase  modulation  (PM),  we  encode  information  into  the 
phase  0(t),  with  the  envelope  remaining  constant.  The  transmitted  signal  is  given  by 

u(t)  =  Accos(27r fct  +  0(£)),  Angle  Modulation  (information  carried  in  9) 

For  a  message  m(t),  we  have 

9{t)  =  kpm{t )  ,  Phase  Modulation,  (3.22) 

and 


1  d9{t) 


=  fit)  =  kfm(t )  ,  Frequency  Modulation, 


27T  dt 

where  kp,  kf  are  constants.  Integrating  (3.23),  the  phase  of  the  FM  waveform  is  given  by: 


(3.23) 


9(t)  =  0(0)  +  2nkf  /  m{r)dr 


(3.24) 


Comparing  (3.24)  with  (3.22),  we  see  that  FM  is  equivalent  to  PM  with  the  integral  of  the 
message.  Similarly,  for  differentiable  messages,  PM  can  be  interpreted  as  FM,  with  the  input 
to  the  FM  modulator  being  the  derivative  of  the  message.  Figure  3.21  provides  an  example 
illustrating  this  relationship;  this  is  actually  a  digital  modulation  scheme  called  continuous  phase 
modulation,  as  we  shall  see  when  we  study  digital  communication.  In  this  example,  the  digital 
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+1 


(a)  Messages  used  for  angle  modulation 


(b)  Angle  modulated  signal 


Figure  3.21:  The  equivalence  of  FM  and  PM 


-1  -1 


(a)  Digital  input  to  phase  modu-  (b)  Phase  shift  keyed  signal 

lator 

Figure  3.22:  Phase  discontinuities  in  PM  signal  due  to  sharp  message  transitions. 
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message  +1,  — 1,  — 1,  +1  is  the  input  to  an  FM  modulator:  the  instantaneous  frequency  switches 
from  fc  +  kf  (for  one  time  unit)  to  fc  —  kf  (for  two  time  units)  and  then  back  to  fc  +  kf  again. 
The  same  waveform  is  produced  when  we  feed  the  integral  of  the  message  into  a  PM  modulator, 
as  shown  in  the  figure. 

When  the  digital  message  of  Figure  3.21  is  input  to  a  phase  modulator,  then  we  get  a  modulated 
waveform  with  phase  discontinuities  when  the  message  changes  sign.  This  is  in  contrast  to  the 
output  in  Figure  3.21,  where  the  phase  is  continuous.  That  is,  if  we  compare  FM  and  PM 
for  the  same  message,  we  infer  that  FM  waveforms  should  have  less  abrupt  phase  transitions 
due  to  the  smoothing  resulting  from  integration:  compare  the  expressions  for  the  phases  of  the 
modulated  signals  in  (3.22)  and  (3.24)  for  the  same  message  m(t).  Thus,  for  a  given  level  of 
message  variations,  we  expect  FM  to  have  smaller  bandwidth.  FM  is  therefore  preferred  to 
PM  for  analog  modulation,  where  the  communication  system  designer  does  not  have  control 
over  the  properties  of  the  message  signal  (e.g.,  the  system  designer  cannot  require  the  message 
to  be  smooth).  For  this  reason,  and  also  given  the  basic  equivalence  of  the  two  formats,  we 
restrict  the  discussion  in  the  remainder  of  this  section  to  FM  for  the  most  part.  PM,  however, 
is  extensively  employed  in  digital  communication,  where  the  system  designer  has  significant 
flexibility  in  shaping  the  message  signal.  In  this  context,  we  use  the  term  Phase  Shift  Keying 
(PSK)  to  denote  the  discrete  nature  of  the  information  encoded  in  the  message.  Figure  3.22  is 
actually  a  simple  example  of  PSK,  although  in  practice,  the  phase  of  the  modulated  signal  is 
shaped  to  be  smoother  in  order  to  improve  bandwidth  efficiency. 

Frequency  Deviation  and  Modulation  Index:  The  maximum  deviation  in  instantaneous 
frequency  due  to  a  message  m(t )  is  given  by 

A  fmax  =  kfmaxt\m(t)\ 

If  the  bandwidth  of  the  message  is  B ,  the  modulation  index  is  defined  as 

n  _  A fmax  _  fc/UiaXt|m(t)| 

B  B 

We  use  the  term  narrowband  FM  if  /3  <  1  (typically  much  smaller  than  one),  and  the  term 
wideband  FM  if  f3  >  1.  We  discuss  the  bandwidth  occupancy  of  FM  signals  in  more  detail  a  little 
later,  but  note  for  now  that  the  bandwidth  of  narrowband  FM  signals  is  dominated  by  that  of  the 
message,  while  the  bandwidth  of  wideband  FM  signals  is  dominated  by  the  frequency  deviation. 

Consider  the  FM  signal  corresponding  to  a  sinusoidal  message  m(t)  =  Am  cos  2nfmt.  The  phase 
deviation  due  to  this  message  is  given  by 

9(t)  =  2nkf  f  Am  cos(27r/mr)  dr  =  A™kf  sin(27 rfmt) 

JO  Jm 

Since  the  maximum  frequency  deviation  A fmax  =  Amkf  and  the  message  bandwidth  B  =  /m, 
the  modulation  index  is  given  by  f3  =  Jkf ,  so  that  the  phase  deviation  can  be  written  as 

6{t)  =  (3sm2nfrnt  (3.25) 

Modulation:  An  FM  modulator,  by  definition,  is  a  Voltage  Controlled  Oscillator  (VCO),  whose 
output  is  a  sinusoidal  wave  whose  instantaneous  frequency  offset  from  a  reference  frequency  is 
proportional  to  the  input  signal.  VCO  implementations  are  often  based  on  the  use  of  varactor 
diodes,  which  provide  voltage-controlled  capacitance,  in  LC  tuned  circuits.  This  is  termed  direct 
FM  modulation,  in  that  the  output  of  the  VCO  produces  a  passband  signal  with  the  desired 
frequency  deviation  as  a  function  of  the  message.  The  VCO  output  may  be  at  the  desired  carrier 
frequency,  or  at  an  intermediate  frequency.  In  the  latter  scenario,  it  must  be  upconverted  further 
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to  the  carrier  frequency,  but  this  operation  does  not  change  the  frequency  modulation.  Direct 
FM  modulation  may  be  employed  for  both  narrowband  and  wideband  modulation. 

An  alternative  approach  to  wideband  modulation  is  to  first  generate  a  narrowband  FM  signal 
(typically  using  a  phase  modulator),  and  to  then  multiply  the  frequency  (often  over  multiple 
stages)  using  nonlinearities,  thus  increasing  the  frequency  deviation  as  well  as  the  carrier  fre¬ 
quency.  This  method,  which  is  termed  indirect  FM  modulation,  is  of  historical  importance,  but  is 
not  used  in  present-day  FM  systems  because  direct  modulation  for  wideband  FM  is  now  feasible 
and  cost-effective. 

Demodulation:  Many  different  approaches  to  FM  demodulation  have  evolved  over  the  past  cen¬ 
tury.  Here  we  discuss  two  important  classes  of  demodulators:  limiter-discriminator  demodulator 
in  Section  3.3.1,  and  the  phase  locked  loop  in  Section  3.5. 


3.3.1  Limiter-Discriminator  Demodulation 


Limiter 


A  cos(2jtfc  t  +0(t)) 


Figure  3.23:  Limiter-Discriminator  Demodulation  of  FM. 


The  task  of  an  FM  demodulator  is  to  convert  frequency  variations  in  the  passband  received 
signal  into  amplitude  variations,  thus  recovering  an  estimate  of  the  message.  Ideally,  therefore, 
an  FM  demodulator  would  produce  the  derivative  of  the  phase  of  the  received  signal;  this  is 
termed  a  discriminator,  as  shown  in  Figure  3.23.  While  an  ideal  FM  signal  as  in  (3.26)  does  not 
have  amplitude  fluctuations,  noise  and  channel  distortions  might  create  such  fluctuations,  which 
leads  to  unwanted  contributions  to  the  discriminator  output.  In  practice,  therefore,  as  shown 
in  the  figure,  the  discriminator  is  typically  preceded  by  a  limiter,  which  removes  amplitude 
fluctuations  due  to  noise  and  channel  distortions  which  might  lead  to  unwanted  contributions 
to  the  discriminator  output.  This  is  achieved  by  passing  the  modulated  sinusoidal  waveform 
through  a  hardlimiter,  which  generates  a  square  wave,  and  then  selecting  the  right  harmonic 
using  a  bandpass  filter  tuned  to  the  carrier  frequency.  The  overall  structure  is  termed  a  limiter- 
discriminator. 

Ideal  limiter-discriminator:  Following  the  limiter,  we  have  an  FM  signal  of  the  form: 


yp(t)  =  A  cos(2n  fct  +  6{t)) 

where  6(t)  may  include  contributions  due  to  channel  and  noise  impairments  (to  be  discussed 
later),  as  well  as  the  angle  modulation  due  to  the  message.  An  ideal  discriminator  now  produces 
the  output  c-^p-  (where  we  ignore  scaling  factors). 

A  crude  realization  of  a  discriminator,  which  converts  fluctuations  in  frequency  to  fluctuations 
in  envelope,  is  shown  in  Figure  3.24.  Taking  the  derivative  of  the  FM  signal 


mfm(1 )  =  Ac  cos 


2nfct  +  2TTkf  /  m(r)dT  +  60 


(3.26) 
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A  cos(27ifc  t  +0(t)) 
(from  limiter) 


d0(t)/dt 


2  Jifc  +  d0(t)/dt 


Figure  3.24:  A  crude  discriminator  based  on  differentiation  and  envelope  detection. 


we  have 

v(t)  =  — ~  =  —Ac  (27 r/c  +  2nkfm(t))  sin  ^2nfct  +  2nkf  J  m(r)dT  +  6q 

The  envelope  of  v(t)  is  2nAc\fc  +  kfm(t) |.  Noting  that  kfm(t )  is  the  instantaneous  frequency 
deviation  from  the  carrier,  whose  magnitude  is  much  smaller  than  fc  for  a  properly  designed 
system,  we  realize  that  fc  +  kfm(t)  >  0  for  all  t.  Thus,  the  envelope  equals  27 rAc(/c  +  kfm(t)), 
so  that  passing  the  discriminator  output  through  an  envelope  detector  yields  a  scaled  and  DC- 
shifted  version  of  the  message.  Using  AC  coupling  to  reject  the  DC  term,  we  obtain  a  scaled 
version  of  the  message  m(t),  just  as  in  conventional  AM. 


FM  Signal 


Figure  3.25:  Slope  detector  using  a  tuned  circuit  offset  from  resonance. 

The  discriminator  as  described  above  corresponds  to  the  frequency  domain  transfer  function 
H(f)  =  j2nf,  and  can  therefore  be  approximated  (up  to  DC  offsets)  by  transfer  functions  that 
are  approximately  linear  over  the  FM  band  of  interest.  An  example  of  such  a  slope  detector  is 
given  in  Figure  3.25,  where  the  carrier  frequency  fc  is  chosen  at  an  offset  from  the  resonance 
frequency  /o  of  a  tuned  circuit. 

One  problem  with  the  simple  discriminator  and  its  approximations  is  that  the  envelope  detector 
output  has  a  significant  DC  component:  when  we  get  rid  of  this  using  AC  coupling,  we  also 
attenuate  low  frequency  components  near  DC.  This  limitation  can  be  overcome  by  employing 
circuits  that  rely  on  the  approximately  linear  variations  in  amplitude  and  phase  of  tuned  circuits 
around  resonance  to  synthesize  approximations  to  an  ideal  discriminator  whose  output  is  the 
derivative  of  the  phase.  These  include  the  Foster-Seely  detector  and  the  ratio  detector.  Circuit 
level  details  of  such  implementations  are  beyond  our  scope. 


3.3.2  FM  Spectrum 

We  first  consider  a  naive  but  useful  estimate  of  FM  bandwidth  termed  Carson’s  rule.  We 
then  show  that  the  spectral  properties  of  FM  are  actually  quite  complicated,  even  for  a  simple 
sinusoidal  message,  and  outline  methods  of  obtaining  more  detailed  bandwidth  estimates. 
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Consider  an  angle  modulated  signal,  up{t)  =  Accos  ( 2irfct  +  9{t)),  where  9{t)  contains  the  mes¬ 
sage  information.  For  a  baseband  message  m(t)  of  bandwidth  5,  the  phase  9{t)  for  PM  is  also 
a  baseband  signal  with  the  same  bandwidth.  The  phase  9{t)  for  FM  is  the  integral  of  the  mes¬ 
sage.  Since  integration  smooths  out  the  time  domain  signal,  or  equivalently,  attenuates  higher 
frequencies,  9{t)  is  a  baseband  signal  with  bandwidth  at  most  B.  We  therefore  loosely  think  of 
9{t)  as  having  a  bandwidth  equal  to  B,  the  message  bandwidth,  for  the  remainder  of  this  section. 

The  complex  envelope  of  up  with  respect  to  fc  is  given  by 

u(t)  =  Ace^6lyt)  =  Ac  cos  0(f)  +  jAcsin9(t) 

Now,  if  \0(t)\  is  small,  as  is  the  case  for  narrowband  angle  modulation,  then  cos 9{t)  «  1  and 
sin  0(f)  ~  0(f),  so  that  the  complex  envelope  is  approximately  given  by 

u(t)  «  Ac  +  jAc9(t) 

Thus,  the  passband  signal  is  approximately  given  by 

up(t)  ~  Accos27rfct  —  9{t)Ac sin 2nfct 

Thus,  the  I  component  has  a  large  unmodulated  carrier  contribution  as  in  conventional  AM,  but 
the  message  information  is  now  in  the  Q  component  instead  of  in  the  I  component,  as  in  AM. 
The  Fourier  transform  is  given  by 

uru)  =  y  m  -  fc) + s(f + /„))  -  ^  (e(/  -  q  -  e(/ + /c)j 

where  @(/)  denotes  the  Fourier  transform  of  0(f).  The  magnitude  spectrum  is  therefore  given 
by 

IW)I  =  y  w  ~  fc)  +  6{f  +  fc>)  +  T (|e(/  “  /c)l  + |e(/  +  /c)l)  (3'27) 

Thus,  the  bandwidth  of  a  narrowband  FM  signal  is  2B,  or  twice  the  message  bandwidth,  just  as 
in  AM.  For  example,  narrowband  angle  modulation  with  a  sinusoidal  message  m(t)  =  cos  2n fmt 
occupies  a  bandwidth  of  2 fm:  9(t )  =  y-  sm2nfmt  for  FM,  and  9(t)  =  kp  cos 2tt fmt)  for  PM. 

For  wideband  FM,  we  would  expect  the  bandwidth  to  be  dominated  by  the  frequency  deviation 
For  messages  that  have  positive  and  negative  peaks  of  similar  size,  the  frequency  devia¬ 
tion  ranges  between  —A fmax  and  A fmax,  where  A fmax  =  kfmax.t\m(t)\.  In  this  case,  we  expect 
the  bandwidth  to  be  dominated  by  the  instantaneous  deviations  around  the  carrier  frequency, 
which  spans  an  interval  of  length  2  A  fmax- 

Carson’s  rule:  This  is  an  estimate  for  the  bandwidth  of  a  general  FM  signal,  based  on  simply 
adding  up  the  estimates  from  our  separate  discussion  of  narrowband  and  wideband  modulation: 

Bfm  2 B  +  2A fmax  =  2B(f3  +  1)  ,  Carson's  rule  (3.28) 

where  /3  =  Afmax/B  is  the  modulation  index,  also  called  the  FM  deviation  ratio,  defined  earlier. 

FM  Spectrum  for  a  Sinusoidal  Message:  In  order  to  get  more  detailed  insight  into  what 
the  spectrum  of  an  FM  signal  looks  like,  let  us  now  consider  the  example  of  a  sinusoidal  message, 
for  which  the  phase  deviation  is  given  by  9{t)  =  [3  sin  27rfmt,  from  (3.25).  The  complex  envelope 
of  the  FM  signal  with  respect  to  fc  is  given  by 

u(t)  =  ejm  =  ejf}sin2nfmt 
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Since  the  sinusoid  in  the  exponent  is  periodic  with  period  j-,  so  is  u{t).  It  can  therefore  be 
expanded  into  a  Fourier  series  of  the  form 


u(t)  =  u[n]ej2™fmt 

n=—oo 

where  the  Fourier  coefficients  {w[n]}  are  given  by 


u[n\  =  f„ 


1 

2/m 


2  fn 


u 


(t)e~j2nnfmtdt  =  f„ 


1 

2/m 


2  frr 


ej/3  sin  2irfmte-j2nnfmt^ 


Using  the  change  of  variables  2tt fmt  =  x,  we  have 

u[n]  =  -L  ["  e^sinx~nxUx  =  Jn{/3) 

2^  J — 7T 

where  Jn{-)  is  the  Bessel  function  of  the  hrst  kind  of  order  n.  While  the  integrand  above  is 
complex-valued,  the  integral  is  real-valued.  To  see  this,  use  Euler’s  formula: 

ej(Psmx-nx)  _  cos^  sin  a;  _  nx)  +  j  sin(/3  sin  a:  —  nx) 

Since  [3  sin  x  —  nx  and  the  sine  function  are  both  odd,  the  imaginary  term  sin(/3  sin  x  —  nx)  above 
is  an  odd  function,  and  integrates  out  to  zero  over  [ — 7r,  7t]  .  The  real  part  is  even,  hence  the 
integral  over  [— tt,  7r]  is  simply  twice  that  over  [0, 7r] .  We  summarize  as  follows: 


u[n]  —  Jn(/3)  —  -^2  [  ej(j3smx  nx')dx  =  ^~  [  cos(/3  sinx  —  nx)dx 


2vr 


7 r 


(3.29) 


Figure  3.26:  Bessel  functions  of  the  hrst  kind,  Jn(/3)  versus  /3,  for  n  —  0,1,  2,  3. 


Bessel  functions  are  available  in  mathematical  software  packages  such  as  Matlab  and  Mathemat- 
ica.  Figure  3.26  shows  some  Bessel  function  plots.  Some  properties  of  Bessel  functions  worth 
noting  are  as  follows: 
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•  For  n  integer,  Jn(j3)  =  (-1  )"J_n(/3)  =  (-l)nJn(-/3). 

•  For  fixed  /?,  </„(/?)  tends  to  zero  fast  as  n  gets  large,  so  that  the  complex  envelope  is  well  ap¬ 
proximated  by  a  finite  number  of  Fourier  series  components.  In  particular,  a  good  approximation 
is  that  Jn(/3)  is  small  for  \n\  >  /3  +  1.  This  leads  to  an  approximation  for  the  bandwidth  of  the 
FM  signal  given  by  2(/3  +  1  )/m,  which  is  consistent  with  Carson’s  rule. 

•  For  fixed  n,  Jn((3)  vanishes  for  specific  values  of  /?,  a  fact  that  can  be  used  for  spectral  shaping. 


To  summarize,  the  complex  envelope  of  an  FM  signal  modulated  by  a  sinusoidal  message  can  be 
written  as 

OO 


u{t) 


_  pj/3  sin  2n fmt  _ 


E 


ej2irnfmt 


(3.30) 


n=— oo 

The  corresponding  spectrum  is  given  by 


OO 

uu)=  E  •'•(«/-»/.) 

n=— oo 


(3.31) 


Noting  that  |  J_n(/3)|  =  |Jn(/3)|,  the  complex  envelope  has  discrete  frequency  components  at  ±n/m 
of  strength  |Jn(/3)|:  these  correspond  to  frequency  components  at  fc  ±nfm  in  the  passband  FM 
signal. 

Fractional  power  containment  bandwidth:  By  Parseval’s  identity  for  Fourier  series,  the 
power  of  the  complex  envelope  is  given  by 


OO  OO 

i  =  M*)l2  =  RiF=  E  JnW)  =  4W)  +  2J2 Jn(fi) 

n=— oo  n=l 

we  can  compute  the  fractional  power  containment  bandwidth  as  2 Kfm,  where  K  >  1  is  the 
smaller  integer  such  that 

K 

n= 1 

where  a  is  the  desired  fraction  of  power  within  the  band,  (e.g.,  a  =  0.99  for  the  99%  power 
containment  bandwidth).  For  integer  values  of  /3  =  1, ...,  10,  we  find  that  K  =  f3  +  1  provides  a 
good  approximation  to  the  99%  power  containment  bandwidth,  which  is  again  consistent  with 
Carson’s  formula. 


3.3.3  Concept  synthesis  for  FM 


s- 1  (microsec) 


Figure  3.27:  Input  to  the  VCO  in  Example  3.3.1. 


The  following  worked  problem  brings  together  some  of  the  concepts  we  have  discussed  regarding 
FM. 
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Example  3.3.1  The  signal  a(t)  shown  in  Figure  3.27  is  fed  to  a  VCO  with  quiescent  frequency 
of  5  MHz  and  frequency  deviation  of  25  KHz/mV.  Denote  the  output  of  the  VCO  by  y(t). 

(a)  Provide  an  estimate  of  the  bandwidth  of  y.  Clearly  state  the  assumptions  that  you  make. 

(b)  The  signal  y{t)  is  passed  through  an  ideal  bandpass  filter  of  bandwidth  5  KHz,  centered  at 
5.005  MHz.  Provide  the  simplest  possible  expression  for  the  power  at  the  filter  output  (if  you 
can  give  a  numerical  answer,  do  so). 

Solution:  (a)  The  VCO  output  is  an  FM  signal  with 

max  =  kfmaxtm(t)  =  25  KHz/mV  x  2  mV  =  50  KHz 

The  message  is  periodic  with  period  100  microseconds,  hence  its  fundamental  frequency  is  10 
KHz.  Approximating  its  bandwidth  by  its  first  harmonic,  we  have  B  «  10  KHz.  Using  Carson’s 
formula,  we  can  approximate  the  bandwidth  of  the  FM  signal  at  the  VCO  output  as 

Bfm  »  2A fmax  +  2 120  KHz 

(b)  The  complex  envelope  of  the  VCO  output  is  given  by  ,  where 

9(t)  =  2nkf  J  m{r)dr 

For  periodic  messages  with  zero  DC  value  (as  is  the  case  for  m(t )  here),  9(t),  and  hence,  e-7"0^ 
has  the  same  period  as  the  message.  We  can  therefore  express  the  complex  envelope  as  a  Fourier 
series  with  complex  exponentials  at  frequencies  nfm,  where  fm  =  10  KHz  is  the  fundamental 
frequency  for  the  message,  and  where  n  takes  integer  values.  Thus,  the  FM  signal  has  discrete 
components  at  fc  +  nfm,  where  fc  —  5  MHz  in  this  example.  A  bandpass  filter  at  5.005  MHz 
with  bandwidth  5  KHz  does  not  capture  any  of  these  components,  since  it  spans  the  interval 
[5.0025,  5.0075]  MHz,  whereas  the  nearest  Fourier  components  are  at  5  MHz  and  5.01  MHz. 
Thus,  the  power  at  the  output  of  the  bandpass  filter  is  zero. 


3.4  The  Superheterodyne  Receiver 

The  receiver  in  a  radio  communication  system  must  downconvert  the  passband  received  signal 
down  to  baseband  in  order  to  recover  the  message.  At  the  turn  of  the  twentieth  century,  it 
was  difficult  to  produce  amplification  at  frequencies  beyond  a  few  MHz)  using  the  vacuum  tube 
technology  of  that  time.  However,  higher  carrier  frequencies  are  desirable  because  of  the  larger 
available  bandwidths  and  the  smaller  antennas  required.  The  invention  of  the  superheterodyne, 
or  superhet,  receiver  was  motivated  by  these  considerations.  Basically,  the  idea  is  to  use  sloppy 
design  for  front  end  filtering  of  the  received  radio  frequency  (RF)  signal,  and  for  translating  it  to  a 
lower  intermediate  frequency  (IF).  The  IF  signal  is  then  processed  using  carefully  designed  filters 
and  amplifiers.  Subsequently,  the  IF  signal  can  be  converted  to  baseband  in  a  number  of  different 
ways:  for  example,  an  envelope  detector  for  AM  radio,  a  phase  locked  loop  or  discriminator  for 
FM  radio,  and  a  coherent  quadrature  demodulator  for  digital  cellular  telephone  receivers. 

While  the  original  motivation  for  the  superheterodyne  receiver  is  no  longer  strictly  applicable 
(modern  analog  electronics  are  capable  of  providing  amplification  at  the  carrier  frequencies  in 
commercial  use),  it  is  still  true  that  gain  is  easier  to  provide  at  lower  frequencies  than  at  higher 
frequencies.  Furthermore,  it  becomes  possible  to  closely  optimize  the  processing  at  a  fixed  IF 
(in  terms  of  amplifier  and  filter  design),  while  permitting  a  tunable  RF  front  end  with  more 
relaxed  specifications,  which  is  important  for  the  design  of  radios  that  operate  over  a  wide 
range  of  carrier  frequencies.  For  example,  the  superhet  architecture  is  commonly  employed  for 
AM  and  FM  broadcast  radio  receivers,  where  the  RF  front  end  tunes  to  the  desired  station, 
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translating  the  received  signal  to  a  fixed  IF.  Radio  receivers  built  with  discrete  components  often 
take  advantage  of  the  widespread  availability  of  inexpensive  filters  at  certain  commonly  used 
IF  frequencies,  such  as  455  KHz  (used  for  AM  radio)  and  10.7  MHz  (used  for  FM  radio).  As 
carrier  frequencies  scale  up  to  the  GHz  range  (as  is  the  case  for  modern  digital  cellular  and 
wireless  local  area  network  transceivers),  circuit  components  shrink  with  the  carrier  wavelength, 
and  it  becomes  possible  to  implement  RF  amplifiers  and  filters  using  integrated  circuits.  In  such 
settings,  a  direct  conversion  architecture,  in  which  the  passband  signal  is  directly  translated  to 
baseband,  becomes  increasingly  attractive,  as  discussed  later  in  this  section. 

The  key  element  in  frequency  translation  is  a  mixer,  which  multiplies  two  input  signals.  For  our 
present  purpose,  one  of  these  inputs  is  a  passband  received  signal  Acos(2nfRFt  +  6),  where  the 
envelope  A(t)  and  phase  6{t)  are  baseband  signals  that  contain  message  information.  The  second 
input  is  a  local  oscillator  (LO)  signal,  which  is  a  locally  generated  sinusoid  cos(27t fLot)  (we  set 
the  LO  phase  to  zero  without  loss  of  generality,  effectively  adopting  it  as  our  phase  reference). 
The  output  of  the  mixer  is  therefore  given  by 

A  cos(2vr  fRFt  +  9)  cos(2vr  fLOt)  =  —  cos  ( 2n(fRF  -  fLO)t  +  9)  +  —  cos  (2vr (fRF  +  fLO)t  +  9) 

Thus,  there  are  two  frequency  components  at  the  output  of  the  mixer,  fRF  +  fLO  and  \fRF  — 
Jlo  I  (remember  that  we  only  need  to  talk  about  positive  frequencies  when  discussing  physically 
realizable  signals,  due  to  the  conjugate  symmetry  of  the  Fourier  transform  of  real- valued  time 
signals).  In  the  superhet  receiver,  we  set  one  of  these  as  our  IF,  typically  the  difference  frequency: 
fiF  =  |  Irf  ~  f lo  |  ■ 
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into  antenna 
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Figure  3.28:  Generic  block  diagram  for  a  superhet  receiver. 
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Figure  3.29:  A  superhet  AM  receiver. 


For  a  given  RF  and  a  fixed  IF,  we  therefore  have  two  choices  of  LO  frequency  when  fiF  — 
I  fRF  -  ho\-  fro  =  Irf  ~  fiF  and  fLO  =  f rf  +  fiF  To  continue  the  discussion,  let  us  consider 
the  example  of  AM  broadcast  radio,  which  operates  over  the  band  from  540  to  1600  KHz,  with 
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10  KHz  spacing  between  the  carrier  frequencies  for  different  stations.  The  audio  message  signal 
is  limited  to  5  KHz  bandwidth,  modulated  using  conventional  AM  to  obtain  an  RF  signal  of 
bandwidth  10  KHz.  Figure  3.29  shows  a  block  diagram  for  the  superhet  architecture  commonly 
used  in  AM  receivers.  The  RF  bandpass  filter  must  be  tuned  to  the  carrier  frequency  for  the 
desired  station,  and  at  the  same  time,  the  LO  frequency  into  the  mixer  must  be  chosen  so  that 
the  difference  frequency  equals  the  IF  frequency  of  455  KHz.  If  fLO  =  f rf  +  fiF ,  then  the 
LO  frequency  ranges  from  995  to  2055  KHz,  corresponding  to  an  approximately  2-fold  variation 
in  tuning  range.  If  / lo  —  Irf  ~  fiF ,  then  the  LO  frequency  ranges  from  85  to  1145  KHz, 
corresponding  to  more  than  13-fold  variation  in  tuning  range.  The  first  choice  is  therefore 
preferred,  because  it  is  easier  to  implement  a  tunable  oscillator  over  a  smaller  tuning  range. 


BEFORE  TRANSLATION  TO  IF 
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Figure  3.30:  The  role  of  image  rejection  and  channel  selection  in  superhet  receivers. 


Having  fixed  the  LO  frequency,  we  have  a  desired  signal  at  fnF  =  /lo  —  f if  that  leads  to  a 
component  at  IF,  and  potentially  an  undesired  image  frequency  at  fiM  =  /lo  +  fiF  =  Irf  +  2  fip 
that  also  leads  to  a  component  at  IF.  The  job  of  the  RF  bandpass  Liter  is  to  block  this  image 
frequency.  Thus,  the  Liter  must  let  in  the  desired  signal  at  (so  that  its  bandwidth  must  be 
larger  than  10  KHz),  but  severely  attenuate  the  image  frequency  which  is  910  KHz  away  from 
the  center  frequency.  It  is  therefore  termed  an  image  reject  Liter.  We  see  that,  for  the  AM 
broadcast  radio  application,  a  superhet  architecture  allows  us  to  design  the  tunable  image  reject 
Liter  to  somewhat  relaxed  speciLcations.  However,  the  image  reject  Liter  does  let  in  not  only 
the  signal  from  the  desired  station,  but  also  those  from  adjacent  stations.  It  is  the  job  of  the 
IF  Liter,  which  is  tuned  to  the  Lxed  frequency  of  455  KHz,  to  Liter  out  these  adjacent  stations. 
For  this  purpose,  we  use  a  highly  selective  Liter  at  IF  with  a  bandwidth  of  10  KHz.  Figure  3.30 
illustrates  these  design  considerations  more  generally. 

Receivers  for  FM  broadcast  radio  also  commonly  use  a  superhet  architecture.  The  FM  broadcast 
band  ranges  from  88-108  MHz,  with  carrier  frequency  separation  of  200  KHz  between  adjacent 
stations.  The  IF  is  chosen  at  10.7  MHz,  so  that  the  LO  is  tuned  from  98.7  to  118.7  MHz  for  the 
choice  fpo  —  Irf  +  f if-  The  RF  Liter  speciLcations  remain  relaxed:  it  has  to  let  in  the  desired 
signal  of  bandwidth  200  KHz,  while  rejecting  an  image  frequency  which  is  ‘If if  =  21.4  MHz  away 
from  its  center  frequency.  We  discuss  the  structure  of  the  FM  broadcast  signal,  particularly  the 
way  in  which  stereo  FM  is  transmitted,  in  more  detail  in  Section  3.6. 

Roughly  indexing  the  difficulty  of  implementing  a  Liter  by  the  ratio  of  its  center  frequency  to  its 
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bandwidth,  or  its  Q  factor,  with  high  Q  being  more  difficult  to  implement,  we  have  the  following 
fundamental  tradeoff  for  superhet  receivers.  If  we  use  a  large  IF,  then  the  Q  needed  for  the 
image  reject  filter  is  smaller.  On  the  other  hand,  the  Q  needed  for  the  IF  filter  to  reject  an 
interfering  signal  whose  frequency  is  near  that  of  the  desired  signal  becomes  higher.  In  modern 
digital  communication  applications,  superheterodyne  reception  with  multiple  IF  stages  may  be 
used  in  order  to  work  around  this  tradeoff,  in  order  to  achieve  the  desired  gain  for  the  signal  of 
interest  and  to  attenuate  sufficiently  interference  from  other  signals,  while  achieving  an  adequate 
degree  of  image  rejection.  Image  rejection  can  be  enhanced  by  employing  appropriately  designed 
image-reject  mixer  architectures. 

Direct  conversion  receivers:  With  the  trend  towards  increasing  monolithic  integration  of 
digital  communication  transceivers  for  applications  such  as  cellular  telephony  and  wireless  local 
area  networks,  the  superhet  architecture  is  often  being  supplanted  by  direct  conversion  (or  zero 
IF)  receivers,  in  which  the  passband  received  signal  is  directly  converted  down  to  baseband 
using  a  quadrature  mixer  at  the  RF  carrier  frequency.  In  this  case,  the  desired  signal  is  its 
own  image,  which  removes  the  necessity  for  image  rejection.  Moreover,  interfering  signals  can  be 
filtered  out  at  baseband,  often  using  sophisticated  digital  signal  processing  after  analog-to-digital 
conversion  (ADC),  provided  that  there  is  enough  dynamic  range  in  the  circuitry  to  prevent  a 
strong  interferer  from  swamping  the  desired  signal  prior  to  the  ADC.  In  contrast,  the  high  Q 
bandpass  filters  required  for  image  rejection  and  interference  suppression  in  the  superhet  design 
must  often  be  implemented  off-chip  using,  for  example,  surface  acoustic  wave  (SAW)  devices, 
which  is  bulky  and  costly.  Thus,  direct  conversion  is  in  some  sense  the  “obvious”  thing  to  do, 
except  that  historically,  people  were  unable  to  make  it  work,  leading  to  the  superhet  architecture 
serving  as  the  default  design  through  most  of  the  twentieth  century. 

A  key  problem  with  direct  conversion  is  that  LO  leakage  into  the  RF  input  of  the  mixer  causes 
self-mixing,  leading  to  a  DC  offset.  While  a  DC  offset  can  be  calibrated  out,  the  main  problem 
is  that  it  can  saturate  the  amplifiers  following  the  mixer,  thus  swamping  out  the  contribution  of 
the  weaker  received  signal.  Note  that  the  DC  offset  due  to  LO  leakage  is  not  a  problem  with  a 
superhet  architecture,  since  the  DC  term  gets  filtered  out  by  the  passband  IF  filter.  Other  prob¬ 
lems  with  direct  conversion  include  1//  noise  and  susceptibility  to  second  order  nonlinearities, 
but  discussion  of  these  issues  is  beyond  our  current  scope.  However,  since  the  1990s,  integrated 
circuit  designers  have  managed  to  overcome  these  and  other  obstacles,  and  direct  conversion  re¬ 
ceivers  have  become  the  norm  for  monolithic  implementations  of  modern  digital  communication 
transceivers.  These  include  cellular  systems  in  various  licensed  bands  ranging  from  900  MHz  to 
2  GHz,  and  WLANs  in  the  2.4  GHz  and  5  GHz  unlicensed  bands. 

The  insatiable  demand  for  communication  bandwidth  virtually  assures  us  that  we  will  seek  to 
exploit  frequency  bands  well  beyond  5  GHz,  and  circuit  designers  will  be  making  informed  choices 
between  the  superhet  and  direct  conversion  architectures  for  radios  at  these  higher  frequencies. 
For  example,  the  60  GHz  band  in  the  United  States  has  7  GHz  of  unlicensed  spectrum;  this 
band  is  susceptible  to  oxygen  absorption,  and  is  ideally  suited  for  short  range  (e.g.  10-500 

meters  range)  communication  both  indoors  and  outdoors.  Similarly,  the  71-76  GHz  and  81- 
86  GHz  bands,  which  avoid  oxygen  absorption  loss,  are  available  for  semi- unlicensed  point-to- 
point  “last  mile”  links.  Just  as  with  cellular  and  WLAN  applications  in  lower  frequency  bands, 
we  expect  that  proliferation  of  applications  using  these  “millimeter  (mm)  wave”  bands  would 
require  low-cost  integrated  circuit  transceiver  implementations.  Based  on  the  trends  at  lower 
frequencies,  one  is  tempted  to  conjecture  that  initial  circuit  designs  might  be  based  on  superhet 
architectures,  with  direct  conversion  receivers  becoming  subsequently  more  popular  as  designers 
become  more  comfortable  with  working  at  these  higher  frequencies.  It  is  interesting  to  note 
that  the  design  experience  at  lower  carrier  frequencies  does  not  go  to  waste;  for  example,  direct 
conversion  receivers  at,  say,  5  GHz,  can  serve  as  the  IF  stage  for  superhet  receivers  for  mm  wave 
communication. 


122 


3.5  The  Phase  Locked  Loop 


The  phase  locked  loop  (PLL)  is  an  effective  FM  demodulator,  but  also  has  a  far  broader  range 
of  applications,  including  frequency  synthesis  and  synchronization.  We  therefore  treat  it  sepa¬ 
rately  from  our  coverage  of  FM.  The  PLL  provides  a  canonical  example  of  the  use  of  feedback 
for  estimation  and  synchronization  in  communication  systems,  a  principle  that  is  employed  in 
variants  such  as  the  Costas’  loop  and  the  delay  locked  loop. 


Function  of  phase  difference  0(—  0O 


Figure  3.31:  PLL  block  diagram. 


The  key  idea  behind  the  PLL,  depicted  in  Figure  3.31,  is  as  follows:  we  would  like  to  lock  on  to 
the  phase  of  the  input  to  the  PLL.  We  compare  the  phase  of  the  input  with  that  of  the  output  of 
a  voltage  controlled  oscillator  (VCO)  using  a  phase  detector.  The  difference  between  the  phases 
drives  the  input  of  the  VCO.  If  the  VCO  output  is  ahead  of  the  PLL  input  in  phase,  then  we 
would  like  to  retard  the  VCO  output  phase.  If  the  VCO  output  is  behind  the  PLL  input  in  phase, 
we  would  like  to  advance  the  VCO  output  phase.  This  is  done  by  using  the  phase  difference  to 
control  the  VCO  input.  Typically,  rather  than  using  the  output  of  the  phase  detector  directly 
for  this  purpose,  we  smooth  it  out  using  a  loop  filter  in  order  to  reduce  the  effect  of  noise. 


1/2  Ac  AvSm(e,-e0) 


Figure  3.32:  PLL  realization  using  a  mixer  as  phase  detector. 


Mixer  as  phase  detector:  The  classical  analog  realization  of  the  PLL  is  based  on  using  a 
mixer  (i.e. ,  a  multiplier)  as  a  phase  detector.  To  see  how  this  works,  consider  the  product  of  two 
sinusoids  whose  phases  we  are  trying  to  align: 

cos(27t fct  +  6i)  cos(27t  fct  +  d2)  =  ^  cos(0!  -  02)  +  ^  cos(47r/ct  +  6X  +  02)) 

The  second  term  on  the  right-hand  side  is  a  passband  signal  at  2 fc  which  can  be  filtered  out 
by  a  lowpass  filter.  The  first  term  contains  the  phase  difference  Q\  —  d2,  and  is  to  be  used  to 
drive  the  VCO  so  that  we  eventually  match  the  phases.  Thus,  the  first  term  should  be  small 
when  we  are  near  a  phase  match.  Since  the  driving  term  is  the  cosine  of  the  phase  difference, 
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the  phase  match  condition  is  9\  —  9<±  =  vr/2.  That  is,  using  a  mixer  as  our  phase  detector  means 
that,  when  the  PLL  is  locked,  the  phase  at  the  VCO  output  is  90°  offset  from  the  phase  of  the 
PLL  input.  Now  that  we  know  this,  we  adopt  a  more  convenient  notation,  changing  variables 
to  define  a  phase  difference  whose  value  at  the  desired  matched  state  is  zero  rather  than  7t/2. 
Let  the  PLL  input  be  denoted  by  Accos(2nfc  +  9i(t)),  and  let  the  VCO  output  be  denoted  by 
Av  cos(27 r/c  +  90(t)  +  f )  =  —  Av  sin(27r/c  +  90(t)).  The  output  of  the  mixer  is  now  given  by 

-ACAV  cos  (27 r/c  +  9i(t))  sin  ( 2nfc  +  9a(t )) 

=  sin  (9i(t)  -  90(t ))  -  sin  (47t fct  +  9i(t )  +  90(t )) 

The  second  term  on  the  right-hand  side  is  a  passband  signal  at  2 fc  which  can  be  filtered  out 
as  before.  The  first  term  is  the  desired  driving  term,  and  with  the  change  of  notation,  we  note 
that  the  desired  state,  when  the  driving  term  is  zero,  corresponds  to  9i  =  90.  The  mixer  based 
realization  of  the  PLL  is  shown  in  Figure  3.32. 

The  instantaneous  frequency  of  the  VCO  is  proportional  to  its  input.  Thus  the  phase  of  the 
VCO  output  —sin(2irfct  +  90{t))  is  given  by 


90(t)  =  Kv  f  x(r)dr 
Jo 

ignoring  integration  constants.  Taking  Laplace  transforms,  we  have  0o(s)  =  KvX(s)/ s.  The 
reference  frequency  fc  is  chosen  as  the  quiescent  frequency  of  the  VCO,  which  is  the  frequency  it 
would  produce  when  its  input  voltage  is  zero. 


Output  of  XOR  gate 


n 


1—1  1—1  HI 


Figure  3.33:  PLL  realization  using  XOR  gate  as  phase  detector. 


Mixed  signal  phase  detectors:  Modern  hardware  realizations  of  the  PLL,  particularly  for 
applications  involving  digital  waveforms  (e.g.,  a  clock  signal),  often  realize  the  phase  detector 
using  digital  logic.  The  most  rudimentary  of  these  is  an  exclusive  or  (XOR)  gate,  as  shown  in 
Figure  3.33.  For  the  scenario  depicted  in  the  figure,  we  see  that  the  average  value  of  the  output 
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V  =  V  -VL0  -iVHI  )/2 


0=7  -  n/2 


(a)  DC  value  of  output  of  XOR  gate. 


(b)  XOR  phase  detector  output  after  axes  translation. 


Figure  3.34:  Response  for  the  XOR  phase  detector. 


of  the  XOR  gate  is  linearly  related  to  the  phase  offset  7.  Normalizing  a  period  of  the  square 
wave  to  length  2n,  this  DC  value  V'  is  related  to  7  as  shown  in  Figure  3.34(a).  Note  that,  for 
zero  phase  offset,  we  have  V'  =  Vhi,  and  that  the  response  is  symmetric  around  7  =  0.  In  order 
to  get  a  linear  phase  detector  response  going  through  the  origin,  we  translate  this  curve  along 
both  axes:  we  define  V  =  V'  —  ( Vlo  +  Vhi)  /2  as  a  centered  response,  and  we  define  the  phase 
offset  9  =  7  —  f  •  Thus,  the  lock  condition  ( 9  =  0)  corresponds  to  the  square  waves  being  90°  out 
of  phase.  This  translation  gives  us  the  phase  response  shown  in  Figure  3.34(b),  which  looks  like 
a  triangular  version  of  the  sinusoidal  response  for  the  mixer-based  phase  detector. 

The  simple  XOR-based  phase  detector  has  the  disadvantage  of  requiring  that  the  waveforms 
have  50%  duty  cycle.  In  practice,  more  sophisticated  phase  detectors,  often  based  on  edge 
detection,  are  used.  These  include  “phase-frequency  detectors”  that  directly  provide  information 
on  frequency  differences,  which  is  useful  for  rapid  locking.  While  discussion  of  the  many  phase 
detector  variants  employed  in  hardware  design  is  beyond  our  scope,  references  for  further  study 
are  provided  at  the  end  of  this  chapter. 

3.5.1  PLL  Applications 

Before  trying  to  understand  how  a  PLL  works  in  more  detail,  let  us  discuss  how  we  would  use 
it,  assuming  that  it  has  been  properly  designed.  That  is,  suppose  we  can  design  a  system  such 
as  that  depicted  in  Figure  3.32,  such  that  9a(t)  ~  9i(t).  What  would  we  do  with  such  a  system? 


Function  of  phase  difference  0;-  0O 


FM  received  phase 


Loop 

Filter 


signal  Detector 


VCO 


_ _  FM  Demodulator 

Output 


Figure  3.35:  The  PLL  is  an  FM  demodulator. 


PLL  as  FM  demodulator:  If  the  PLL  input  is  an  FM  signal,  its  phase  is  given  by 


The  VCO  output  phase  is  given  by 
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If  0ok,  Oil  then  ~  s°  that 

Kvx(t )  ~  2nkfm(t) 

That  is,  the  VCO  input  is  approximately  equal  to  a  scaled  version  of  the  message.  Thus,  the 
PLL  is  an  FM  demodulator,  where  the  FM  signal  is  the  input  to  the  PLL,  and  the  demodulator 
output  is  the  VCO  input,  as  shown  in  Figure  3.35. 


Frequency 

synthesizer 

output 


Figure  3.36:  Frequency  synthesis  using  a  PLL  by  inserting  a  frequency  divider  into  the  loop. 


PLL  as  frequency  synthesizer:  The  PLL  is  often  used  to  synthesize  the  local  oscillators  used 
in  communication  transmitters  and  receivers.  In  a  typical  scenario,  we  might  have  a  crystal 
oscillator  which  provides  an  accurate  frequency  reference  at  a  relatively  low  frequency,  say  40 
MHz.  We  wish  to  use  this  to  derive  an  accurate  frequency  reference  at  a  higher  frequency,  say  1 
GHz,  which  might  be  the  local  oscillator  used  at  an  IF  or  RF  stage  in  the  transceiver.  We  have 
a  VCO  that  can  produce  frequencies  around  1  GHz  (but  is  not  calibrated  to  produce  the  exact 
value  of  the  desired  frequency),  and  we  wish  to  use  it  to  obtain  a  frequency  /0  that  is  exactly 
K  times  the  crystal  frequency  f crystal-  This  can  be  achieved  by  adding  a  frequency  divider  into 
the  PLL  loop,  as  shown  in  Figure  3.36.  Such  frequency  dividers  can  be  implemented  digitally 
by  appropriately  skipping  pulses.  Many  variants  of  this  basic  concept  are  possible,  such  as  using 
multiple  frequency  dividers,  frequency  multipliers,  or  multiple  interacting  loops. 

All  of  these  applications  rely  on  the  basic  property  that  the  VCO  output  phase  successfully  tracks 
some  reference  phase  using  the  feedback  in  the  loop.  Let  us  now  try  to  get  some  insight  into  how 
this  happens,  and  into  the  impact  of  various  parameters  on  the  PLL’s  performance. 


3.5.2  Mathematical  Model  for  the  PLL 

The  mixer-based  PLL  in  Figure  3.32  can  be  modeled  as  shown  in  Figure  3.37,  where  0i(t)  is 
the  input  phase,  and  0o{t)  is  the  output  phase.  It  is  also  useful  to  define  the  corresponding 
instantaneous  frequencies  (or  rather,  frequency  deviations  from  the  VCO  quiescent  frequency 
fc)- 

tU\_  1  d6i (*)  1  d6°® 

M  1  2tt  dt  ’  M  j  2t r  dt 

The  phase  and  frequency  errors  are  defined  as 

0e(t)  =  6i(t )  -  0o(t)  ,  fe(t)  =  fi(t )  -  f0(t) 

In  deriving  this  model,  we  can  ignore  the  passband  term  at  2  fc,  which  will  get  rejected  by  the 
integration  operation  due  to  the  VCO,  as  well  as  by  the  loop  filter  (if  a  nontrivial  lowpass  loop 
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(normalized) 

Figure  3.37:  Nonlinear  model  for  mixer-based  PLL. 


filter  is  employed).  From  Figure  3.32,  the  sine  of  the  phase  difference  is  amplified  by  \ACAV  due 
to  the  amplitudes  of  the  PLL  input  and  VCO  output.  This  is  passed  through  the  loop  filter, 
which  has  transfer  function  G(s),  and  then  through  the  VCO,  which  has  a  transfer  function  Kv/s. 
The  loop  gain  K  shown  in  Figure  3.37  is  set  to  be  the  product  K  =  \ACAVKV  (in  addition,  the 
loop  gain  also  includes  additional  amplification  or  attenuation  in  the  loop  that  is  not  accounted 
for  in  the  transfer  function  G(s)). 
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Figure  3.38:  Linearized  PLL  model. 


The  model  in  Figure  3.37  is  difficult  to  analyze  because  of  the  sin(-)  nonlinearity  after  the  phase 
difference  operation.  One  way  to  avoid  this  difficulty  is  to  linearize  the  model  by  simply  dropping 
the  nonlinearity.  The  motivation  is  that,  when  the  input  and  output  phases  are  close,  as  is  the 
case  when  the  PLL  is  in  tracking  mode,  then 

sin(6>j  -  9a)  «  9i-90 

Applying  this  approximation,  we  obtain  the  linearized  model  of  Figure  3.38.  Note  that,  for  the 
XOR-based  response  shown  in  Figure  3.34(b),  the  response  is  exactly  linear  for  \9\  <  |. 

3.5.3  PLL  Analysis 

Under  the  linearized  model,  the  PLL  becomes  an  LTI  system  whose  analysis  is  conveniently 
performed  using  the  Laplace  transform.  From  Figure  3.38,  we  see  that 

(Qi(s)-Q0(s))KG(s)/s  =  Q0(s) 
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from  which  we  infer  the  input-output  relationship 


H(s)  = 


Bo(s) 


KG(s) 


0j(s)  s  +  KG(s) 

It  is  also  useful  to  express  the  phase  error  9e  in  terms  of  the  input  9i,  as  follows: 

He{s)  =  ^4^  =  e'(Sl~„9°(S)  =  1  -  H{s)  =  S 


(3.32) 


©,(5) 


0,(5) 


KG(s) 


(3.33) 


For  this  LTI  model,  the  same  transfer  functions  also  govern  the  relationships  between  the  input 
and  output  instantaneous  frequencies:  since  Fj(s)  =  ^0,(s)  and  Fa(s)  =  j-Q0(s),  we  obtain 
F0(s)/Fi(s)  =  0o(s)/0?;(s).  Thus,  we  have 


Fo(s)  KG(s) 

Fi(s)  U  s  +  KG(s ) 

Fj(s)-F0{s)  (  ,  s 

Fi(s)  e[  1  s  +  KG{s) 


(3.34) 

(3.35) 


First  order  PLL:  When  we  have  a  trivial  loop  filter,  G(s)  =  1,  we  obtain  the  first  order  response 


H(s) 


K 

s  +  K ’ 


He(s) 


s 

s  +  K 


which  is  a  stable  response  for  loop  gain  K  >  0,  with  a  single  pole  at  s  =  —  K .  It  is  interesting  to 
see  what  happens  when  the  input  phase  is  a  step  function,  O^t)  =  A  6,/[0iOO)(f),  or  0,;(s)  =  A  9/s. 
We  obtain 


0o(s)  =  H(s)Qi(s) 


KA9 
s(s  +  K ) 


A  9  _  A  9 
s  s  +  K 


Taking  the  inverse  Laplace  transform,  we  obtain 


90{t)  =  A0(1  -  e  Kt)I[o,oo)(t) 


so  that  90(t)  — >■  A 9  as  t  — >  oo.  Thus,  the  first  order  PLL  can  track  a  sudden  change  in  phase, 
with  the  output  phase  converging  to  the  input  phase  exponentially  fast.  The  residual  phase  error 
is  zero.  Note  that  we  could  also  have  inferred  this  quickly  from  the  final  value  theorem,  without 
taking  the  inverse  Laplace  transform: 


lim  9e(t )  =  lims0e(s)  =  lim sHe(s)QAs 

t — ^oo  s — ^0  s — >0 


(3.36) 


Specializing  to  the  setting  of  interest,  we  obtain 

lim  9Jt)  =  lim  s - - — =  0 

t-to o  s— >o  s  +  K  s 


We  now  examine  the  response  of  the  first  order  PLL  to  a  frequency  step  A/,  so  that  the  instanta¬ 
neous  input  frequency  is  /)(t)  =  A//[0iOO)(i).  The  corresponding  Laplace  transform  is  Ft(s)  = 

The  input  phase  is  the  integral  of  the  instantaneous  frequency: 

9i(t)  =  2tt  j  fi(r)dT 
Jo 
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The  Laplace  transform  of  the  input  phase  is  therefore  given  by 


Si  (a)  =  27 tF(s)/s  = 

s2 

Given  that  the  input-output  relationships  are  identical  for  frequency  and  phase,  we  can  reuse 
the  computations  we  did  for  the  phase  step  input,  replacing  phase  by  frequency,  to  conclude  that 
f0{t)  =  A/(l  —  e~Kt)I[ot00)(t)  — >  A /  as  t  — >  oo,  so  that  the  steady-state  frequency  error  is  zero. 
The  corresponding  output  phase  trajectory  is  left  as  an  exercise,  but  we  can  use  the  final  value 
theorem  to  compute  the  limiting  value  of  the  phase  error: 


lim  9e(t ) 

t— >oo 


lim  s 

s— >0 


s 

s  +  K 


2ttA/ 


2ttA  / 
K 


Thus,  the  first  order  PLL  can  adapt  its  frequency  to  track  a  step  frequency  change,  but  there  is 
a  nonzero  steady-state  phase  error.  This  can  be  fixed  by  increasing  the  order  of  the  PLL,  as  we 
now  show  below. 

Second  order  PLL:  We  now  introduce  a  loop  filter  which  feeds  back  both  the  phase  error  and 
the  integral  of  the  phase  error  to  the  VCO  input  (in  control  theory  terminology,  we  are  using 
’’proportional  plus  integral”  feedback).  That  is,  G(s)  =  1  +  a/s,  where  a  >  0.  This  yields  the 
second  order  response 

B,  S  _  KG(s)  _  K(s  +  a) 

s  +  KG(s)  s2  +  Ks  +  Ka 


s  +  KG(s )  s2  +  Ks  +  Ka 

The  poles  of  the  response  are  at  s  =  It  is  easy  to  check  that  the  response  is  stable 

(i.e.,  the  poles  are  in  the  left  half  plane)  for  K  >  0.  The  poles  are  conjugate  symmetric  with  an 
imaginary  component  if  K2  —  AKa  <  0,  or  K  <  4a,  otherwise  they  are  both  real-valued.  Note 
that  the  phase  error  due  to  a  step  frequency  response  does  go  to  zero.  This  is  easily  seen  by 
invoking  the  final  value  theorem  (3.36): 


lim  9e{t) 

t->-oo 


lim  s— 

s— >0  s2 


s 2  27rA  / 

+  Ks  +  Ka  s 2 


0 


Thus,  the  second  order  PLL  has  zero  steady  state  frequency  and  phase  errors  when  responding 
to  a  constant  frequency  offset. 

We  have  seen  now  that  the  first  order  PLL  can  handle  step  phase  changes,  and  the  second  order 
PLL  can  handle  step  frequency  changes,  while  driving  the  steady-state  phase  error  to  zero.  This 
pattern  continues  as  we  keep  increasing  the  order  of  the  PLL:  for  example,  a  third  order  PLL 
can  handle  a  linear  frequency  ramp,  which  corresponds  to  @,(s)  being  proportional  to  1/s3. 

Linearized  analysis  provides  quick  insight  into  the  complexity  of  the  phase/frequency  variations 
that  the  PLL  can  track,  as  a  function  of  the  choice  of  loop  filter  and  loop  gain.  We  now  take 
another  look  at  the  first  order  PLL,  accounting  for  the  sin(-)  nonlinearity  in  Figure  3.37,  in 
order  to  provide  a  glimpse  of  the  approach  used  for  handling  the  nonlinear  differential  equations 
involved,  and  to  compare  the  results  with  the  linearized  analysis. 

Nonlinear  model  for  the  first  order  PLL:  Let  us  try  to  express  the  phase  error  6e  in  terms  of 
the  input  phase  for  a  first  order  PLL,  with  G(s)  =  1.  The  model  of  Figure  3.37  can  be  expressed 
in  the  time  domain  as: 

K sin (Oe(r))dr  =  0o(t)  =  O^t)  -  6e(t) 
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Differentiating  with  respect  to  t,  we  obtain 


v  .  a  ddi  dde 

A  sin  ue  =  — - — 

dt  dt 


(3.37) 


(Both  9e  and  9t  are  functions  of  t,  but  we  suppress  the  dependence  for  notational  simplicity.) 
Let  us  now  specialize  to  the  specific  example  of  a  step  frequency  input,  for  which 


d6i 

dt 

Plugging  into  (3.37)  and  rearranging,  we  get 

d6, 


=  2vrA  / 


—  =  2nAf  —  K  sin  0e 
dt 


(3.38) 


Figure  3.39:  Phase  plane  plot  for  first  order  PLL. 


We  cannot  solve  the  nonlinear  differential  equation  (3.38)  for  9e  analytically,  but  we  can  get 
useful  insight  by  a  “phase  plane  plot”  of  ^  against  9e,  as  shown  in  Figure  3.39.  Since  sin  9e  <  1, 
we  have  ^  >  27tA /  —  K ,  so  that,  if  A /  >  A-,  then  ^  >  0  for  all  t.  Thus,  for  large  enough 
frequency  offset,  the  loop  never  locks.  On  the  other  hand,  if  A /  <  then  the  loop  does  lock. 
In  this  case,  starting  from  an  initial  error,  say  #e(0),  the  phase  error  follows  the  trajectory  to  the 
right  (if  the  derivative  is  positive)  or  left  (if  the  derivative  is  negative)  until  it  hits  a  point  at 
which  =  0.  From  (3.38),  this  happens  when 

•  a  2nAf  (‘x  qcp 

sm  9e  —  (3.39) 
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Due  to  the  periodicity  of  the  sine  function,  if  9  is  a  solution  to  the  preceding  equation,  so  is 
9  +  2n.  Thus,  if  the  equation  has  a  solution,  there  must  be  at  least  one  solution  in  the  basic 
interval  [ — 7r,  7t]  .  Moreover,  since  sind  =  sin(7r  —  9),  if  9  is  a  solution,  so  is  n  —  9 ,  so  that  there 
are  actually  two  solutions  in  [— vr,7r].  Let  us  denote  by  9e( 0)  =  sin^1  ( 271 the  solution  that 
lies  in  the  interval  [— 7t/2,  7t/2],  This  forms  a  stable  equilibrium:  from  (3.38),  we  see  that  the 
derivative  is  negative  for  phase  error  slightly  above  9e( 0),  and  is  positive  as  the  phase  error 
slightly  below  6b(0),  so  that  the  phase  error  is  driven  back  to  9e( 0)  in  each  case.  Using  exactly 
the  same  argument,  we  see  that  the  points  9e{ 0)  +  2mi  are  also  stable  equilibria,  where  n  takes 
integer  values.  However,  another  solution  to  (3.39)  is  9e(  1)  =  tt  —  0(0),  and  translations  of  it 
by  .  It  is  easy  to  see  that  this  is  an  unstable  equilibrium:  when  there  is  a  slight  perturbation, 
the  sign  of  the  derivative  is  such  that  it  drives  the  phase  error  away  from  9e(  1).  In  general, 
0e(  1)  +  2nn  are  unstable  equlibria,  where  n  takes  integer  values.  Thus,  if  the  frequency  offset  is 
within  the  “pull-in  range”  ^  of  the  first  order  PLL,  then  the  steady  state  phase  offset  (modulo 
2n)  is  0e(O)  =  sin-1  (2n^) ,  which,  for  small  values  of  ,  is  approximately  equal  to  the  value 
predicted  by  the  linearized  analysis. 

Linear  versus  nonlinear  model:  Roughly  speaking,  the  nonlinear  model  (which  we  simply 
simulate  when  phase-plane  plots  get  too  complicated)  tells  us  when  the  PLL  locks,  while  the 
linearized  analysis  provides  accurate  estimates  when  the  PLL  does  lock.  The  linearized  model 
also  tells  us  something  about  scenarios  when  the  PLL  does  not  lock:  when  the  phase  error  blows 
up  for  the  linearized  model,  it  indicates  that  the  PLL  will  perform  poorly.  This  is  because  the 
linearized  model  holds  under  the  assumption  that  the  phase  error  is  small;  if  the  phase  error 
under  this  optimistic  assumption  turns  out  not  to  be  small,  then  our  initial  assumption  must 
have  been  wrong,  and  the  phase  error  must  be  large. 


Figure  3.40:  PLL  for  Example  3.5.1. 


The  following  worked  problem  illustrates  application  of  linearized  PLL  analysis. 

Example  3.5.1  Consider  the  PLL  shown  in  Figure  3.40,  assumed  to  be  locked  at  time  zero. 

(a)  Suppose  that  the  input  phase  jumps  by  e  =  2.72  radians  at  time  zero  (set  the  phase  just 
before  the  jump  to  zero,  without  loss  of  generality).  How  long  does  it  take  for  the  difference 
between  the  PLL  input  phase  and  VCO  output  phase  to  shrink  to  1  radian?  (Make  sure  you 
specify  the  unit  of  time  that  you  use.) 

(b)  Find  the  limiting  value  of  the  phase  error  (in  radians)  if  the  frequency  jumps  by  1  KHz  just 
after  time  zero. 

Solution:  Let  9e(t )  =  9i(t)  —  90{t)  denote  the  phase  error.  In  the  s  domain,  it  is  related  to  the 
input  phase  as  follows: 

Qi(s)  -  ^-©e(s)  =  0e(s) 

so  that 

0e(s)  _  5 

@i(s)  s  +  A 
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(a)  For  a  phase  jump  of  e  radians  at  time  zero,  we  have  ©j(s)  =  which  yields 


ee(s) 


Bi(s) 


s 

s  +  K 


e 

s  +  K 


Going  to  the  time  domain,  we  have 


6e(t)  =  ee~Kt  =  el~Kt 
so  that  9e(t)  =  1  for  1  —  Kt  —  0,  or  t  =  A  —  |  milliseconds. 

(b)  For  a  frequency  jump  of  A/,  the  Laplace  transform  of  the  input  phase  is  given  by 


©«(s)  = 


2ttA/ 


so  that  the  phase  error  is  given  by 


@e(s)  =  0;(s) 


s  +  K 


Using  the  final  value  theorem,  we  have 


lim  9e{t)  =  lims0e(s) 

t — Yoo  s— >0 


2vrA  / 
s(s  +  K) 


2ttA  / 
K 


For  A /  =  1  KHz  and  K  =  5  KHz/radian,  this  yields  a  phase  error  of  27t/5  radians,  or  72°. 


3.6  Some  Analog  Communication  Systems 

Some  of  the  analog  communication  systems  that  we  encounter  (or  at  least,  used  to  encounter) 
in  our  daily  lives  include  broadcast  radio  and  television.  We  have  already  discussed  AM  radio  in 
the  context  of  the  superhet  receiver.  We  now  briefly  discuss  FM  radio  and  television.  Our  goal 
is  to  highlight  design  concepts,  and  the  role  played  in  these  systems  by  the  various  modulation 
formats  we  have  studied,  rather  than  to  provide  a  detailed  technical  description.  Other  commonly 
encountered  examples  of  analog  communication  that  we  do  not  discuss  include  analog  storage 
media  (audiotapes  and  videotapes),  analog  wireline  telephony,  analog  cellular  telephony,  amateur 
ham  radio,  and  wireless  microphones. 


3.6.1  FM  radio 


Pilot 


L+R  signal 

DSB-SC  modulated 

1 

L-R  signal 

i 
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\l 

15  19  23  38  53  Frequency  (KHz) 

Figure  3.41:  Spectrum  of  baseband  input  to  FM  modulator  for  FM  stereo  broadcast. 
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FM  mono  radio  employs  a  peak  frequency  deviation  of  75  KHz,  with  the  baseband  audio  message 
signal  bandlimited  to  15  KHz;  this  corresponds  to  a  modulation  index  of  5.  Using  Carson’s 
formula,  the  bandwidth  of  the  FM  radio  signal  can  be  estimated  as  180  KHz.  The  separation 
between  adjacent  radio  stations  is  200  KHz.  FM  stereo  broadcast  transmits  two  audio  channels, 
“left”  and  “right,”  in  a  manner  that  is  backwards  compatible  with  mono  broadcast,  in  that 
a  standard  mono  receiver  can  extract  the  sum  of  the  left  and  right  channels,  while  remaining 
oblivious  to  whether  the  broadcast  signal  is  mono  or  stereo.  The  structure  of  the  baseband  signal 
into  the  FM  modulator  is  shown  in  Figure  3.41.  The  sum  of  the  left  and  right  channels,  or  the 
L  +  R  signal,  occupies  a  band  from  30  Hz  to  15  KHz.  The  difference,  or  the  L  —  R  signal  (which 
also  has  a  bandwidth  of  15  KHz),  is  modulated  using  DSB-SC,  using  a  carrier  frequency  of  38 
KHz,  and  hence  occupies  a  band  from  23  KHz  to  53  KHz.  A  pilot  tone  at  19  KHz,  at  half  the 
carrier  frequency  for  the  DSB  signal,  is  provided  to  enable  coherent  demodulation  of  the  DSB-SC 
signal.  The  spacing  between  adjacent  FM  stereo  broadcast  stations  is  still  200  KHz,  which  makes 
it  a  somewhat  tight  fit  (if  we  apply  Carson’s  formula  with  a  maximum  frequency  deviation  of  75 
KHz,  we  obtain  an  RF  bandwidth  of  256  KHz). 


signal 


Figure  3.42:  Block  diagram  of  a  simple  FM  stereo  transmitter. 


The  format  of  the  baseband  signal  in  Figure  3.41  (in  particular,  the  DSB-SC  modulation  of  the 
difference  signal)  seems  rather  contrived,  but  the  corresponding  modulator  can  be  implemented 
quite  simply,  as  sketched  in  Figure  3.42:  we  simply  switch  between  the  L  and  R  channel  audio 
signals  using  a  38  KHz  clock.  As  we  shown  in  one  of  the  problems,  this  directly  yields  the  L  +  R 
signal,  plus  the  DSB-SC  modulated  L  —  R  signal.  It  remains  to  add  in  the  19  KHz  pilot  before 
feeding  the  composite  baseband  signal  to  the  FM  modulator. 

The  receiver  employs  an  FM  demodulator  to  obtain  an  estimate  of  the  baseband  transmitted 
signal.  The  L  +  R  signal  is  obtained  by  bandlimiting  the  output  of  the  FM  demodulator  to  15 
KHz  using  a  lowpass  filter;  this  is  what  an  oblivious  mono  receiver  would  do.  A  stereo  receiver, 
in  addition,  processes  the  output  of  the  FM  demodulator  in  the  band  from  15  KHz  to  53  KHz. 
It  extracts  the  19  KHz  pilot  tone,  doubles  its  frequency  to  obtain  a  coherent  carrier  reference, 
and  uses  that  to  demodulate  the  L  —  R  signal  sent  using  DSB-SC.  It  then  obtains  the  L  and  R 
channels  by  adding  and  subtracting  the  L  +  R  and  L  —  R  signals  from  each  other,  respectively. 


3.6.2  Analog  broadcast  TV 

While  analog  broadcast  TV  is  obsolete,  and  is  being  replaced  by  digital  TV  as  we  speak,  we 
discuss  it  briefly  here  to  highlight  a  few  features.  First,  it  illustrates  an  application  of  several 
modulation  schemes:  VSB  (for  intensity  information),  quadrature  modulation  (for  the  color  in¬ 
formation),  and  FM  (for  audio  information).  Second,  it  is  an  interesting  example  of  how  the 
embedding  of  different  kinds  of  information  in  analog  form  must  account  for  the  characteris¬ 
tics  of  the  information  source  (video)  and  destination  (a  cathode  ray  tube  TV  monitor).  This 
customized,  and  rather  painful,  design  process  is  in  contrast  to  the  generality  and  conceptual 
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simplification  provided  by  the  source-channel  separation  principle  in  digital  communication  (men¬ 
tioned  in  Chapter  1).  Indeed,  from  Chapter  4  onwards,  where  we  restrict  attention  to  digital 
communication,  we  do  not  need  to  discuss  source  characteristics. 


Fluorescent  screen 


Horizontal  line  scan 


Horizontal  retrace 


Magnetic  fields 
controlling  beam 
trajectory 


CRT  Schematic 


Raster  scan  pattern 


Horizontal  position  control 


Vertical  position  control 


Controls  needed  for  raster  scan 


Figure  3.43:  Implementing  raster  scan  in  a  CRT  monitor  requires  magnetic  fields  controlled  by 
sawtooth  waveforms. 

We  first  need  a  quick  discussion  of  CRT  TV  monitors.  An  electron  beam  impinging  on  a  fluores¬ 
cent  screen  is  used  to  emit  the  light  that  we  perceive  as  the  image  on  the  TV.  The  electron  beam 
is  “raster  scanned”  in  horizontal  lines  moving  down  the  screen,  with  its  horizontal  and  vertical 
location  controlled  by  two  magnetic  fields  created  by  voltages,  as  shown  in  Figure  3.43.  We  rely 
on  the  persistence  of  human  vision  to  piece  together  these  discrete  scans  into  a  continuous  image 
in  space  and  time.  Black  and  white  TV  monitors  use  a  phosphor  (or  fluorescent  material)  that 
emits  white  light  when  struck  by  electrons.  Color  TV  monitors  use  three  kinds  of  phosphors, 
typically  arranged  as  dots  on  the  screen,  which  emit  red,  green  and  blue  light,  respectively,  when 
struck  by  electrons.  Three  electron  beams  are  used,  one  for  each  color.  The  intensity  of  the 
emitted  light  is  controlled  by  the  intensity  of  the  electron  beam.  For  historical  reasons,  the  scan 
rate  is  chosen  to  be  equal  to  the  frequency  of  the  AC  power  (otherwise,  for  the  power  supplies 
used  at  the  time,  rolling  bars  would  appear  on  the  TV  screen).  In  the  United  States,  this  means 
that  the  scan  rate  is  set  at  60  Hz  (the  frequency  of  the  AC  mains). 

In  order  to  enable  the  TV  receiver  to  control  the  operation  of  the  CRT  monitor,  the  received  signal 
must  contain  not  only  intensity  and  color  information,  but  also  the  timing  information  required 
to  correctly  implement  the  raster  scan.  Figure  3.44  shows  the  format  of  the  composite  video  signal 
containing  this  information,  hr  order  to  reduce  flicker  (again  a  historical  legacy,  since  older  CRT 
monitors  could  not  maintain  intensities  long  enough  if  the  time  between  refreshes  is  too  long),  the 
CRT  screen  is  painted  in  two  rounds  for  each  image  (or  frame):  first  the  odd  lines  (comprising  the 
odd  held)  are  scanned,  then  the  even  lines  (comprising  the  even  held)  are  scanned.  For  the  NTSC 
standard,  this  is  done  at  a  rate  of  60  helds  per  second,  or  30  frames  per  second.  A  horizontal  sync 
pulse  is  inserted  between  each  line.  A  more  complex  vertical  synchronization  waveform  is  inserted 
between  each  held;  this  enables  vertical  synchronization  (as  well  as  other  functionaliities  that 
we  do  not  discuss  here).  The  receiver  can  extract  the  horizontal  and  vertical  timing  information 
from  the  composite  video  signal,  and  generate  the  sawtooth  waveforms  required  for  controlling 
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Figure  3.44:  The  structure  of  a  black  and  white  composite  video  signal  (numbers  apply  to  the 
NTSC  standard). 


the  electron  beam  (one  of  the  first  widespread  commercial  applications  of  the  PLL  was  for  this 
purpose).  For  the  NTSC  standard,  the  composite  video  signal  spans  525  lines,  about  486  of 
which  are  actually  painted  (counting  both  the  even  and  odd  fields).  The  remaining  39  lines 
accommodate  the  vertical  synchronization  waveforms. 

The  bandwidth  of  the  baseband  video  signal  can  be  roughly  estimated  as  follows.  Assuming 
about  480  lines,  with  about  640  pixels  per  line  (for  an  aspect  ratio  of  4:3),  we  have  about  300,000 
pixels,  refreshed  at  the  rate  of  30  times  per  second.  Thus,  our  overall  sampling  rate  is  about 
9  Msamples/second.  This  can  accurately  represent  a  signal  of  bandwidth  4.5  MHz.  For  a  6 
MHz  TV  channel  bandwidth,  DSB  and  wideband  FM  are  therefore  out  of  the  question,  and 
VSB  was  chosen  to  modulate  the  composite  video  signal.  However,  the  careful  shaping  of  the 
spectrum  required  for  VSB  is  not  carried  out  at  the  transmitter,  because  this  would  require 
the  design  of  high-power  electronics  with  tight  specifications.  Instead,  the  transmitter  uses  a 
simple  filter,  while  the  receiver,  which  deals  with  a  low-power  signal,  accomplishes  the  VSB 
shaping  requirement  in  (3.21).  Audio  modulation  is  done  using  FM  in  a  band  adjacent  to  the 
one  carrying  the  video  signal. 

While  the  signaling  for  black  and  white  TV  is  essentially  the  same  for  all  existing  analog  TV 
standards,  the  insertion  of  color  differs  among  standards  such  as  NTSC,  PAL  and  SECAM.  We 
do  not  go  into  details  here,  but,  taking  NTSC  as  an  example,  we  note  that  the  frequency  domain 
characteristics  of  the  black  and  white  composite  video  signal  is  exploited  in  rather  a  clever  way 
to  insert  color  information.  The  black  and  white  signal  exhibits  a  clustering  of  power  around  the 
Fourier  series  components  corresponding  to  the  horizontal  scan  rate,  with  the  power  decaying 
around  the  higher  order  harmonics.  The  color  modulated  signal  uses  the  same  band  as  the  black 
and  white  signal,  but  is  inserted  between  two  such  harmonics,  so  as  to  minimize  the  mutual 
interference  between  the  intensity  information  and  the  color  information.  The  color  information 
is  encoded  in  two  baseband  signals,  which  are  modulated  on  to  the  I  and  Q  components  using 
QAM.  Synchronization  information  that  permits  coherent  recovery  of  the  color  subcarrier  for 
quadrature  demodulation  is  embedded  in  the  vertical  synchronization  waveform. 
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3.7  Concept  Summary 


This  chapter  provides  an  introduction  to  analog  communication  techniques,  focusing  mainly  on 
concepts  that  remain  relevant  in  the  age  of  digital  communication. 

Amplitude  modulation 

•  Double  sideband  (DSB)  suppressed  carrier  (SC)  modulation  refers  to  translation  of  a  real 
baseband  message  to  passband  by  mixing  (multiplying)  it  with  a  sinusoid  at  the  carrier  frequency. 
The  bandwidth  of  a  DSB-SC  signal  is  twice  that  of  the  baseband  message. 

•  For  DSB-SC,  the  message  is  the  1  component  of  the  passband  waveform,  and  therefore  can  be 
recovered  by  standard  downconversion.  However,  this  requires  carrier  synchronization,  and  is 
vulnerable  to  synchronization  errors. 

•  Conventional  AM  refers  to  a  DSB-SC  signal  plus  a  strong  carrier  component.  An  AM  signal 
with  modulation  index  smaller  than  one  preserves  the  shape  of  the  message  in  the  envelope, 
since  the  latter  equals  the  message  plus  a  DC  offset.  The  message  can  therefore  be  demodulated 
without  carrier  synchronization  using  envelope  detection. 

•  The  power  efficiency  of  conventional  AM  can  be  quite  small  for  messages  with  large  dynamic 
ranges,  but  it  does  result  in  simple,  low-cost  receivers.  This  is  the  tradeoff  that  designers  have 
traditionally  made  for  broadcast  systems  such  as  AM  radio  in  which  a  single  powerful  transmitter 
serves  a  large  number  of  receivers.  (However,  such  tradeoffs  are  somewhat  obsolete  today,  since 
advances  in  integrated  circuit  implementations  of  communication  receivers  imply  that  receivers 
can  now  be  both  sophisticated  and  low-cost.)  •  Single  sideband  (SSB)  modulation  corresponds 
to  sending  only  the  upper  or  lower  sideband  of  a  DSB  signal.  The  passband  SSB  signal  therefore 
has  the  same  bandwidth  as  the  original  message. 

•  The  1  component  of  an  SSB  signal  is  the  message,  while  the  Q  component  is  its  Hilbert  transform 
(with  sign  depending  on  which  sideband  is  sent).  Thus,  an  SSB  signal  can  be  constructed  at 
baseband  and  then  upconverted,  rather  than  stringent  filtering  at  passband  to  isolate  a  sideband. 

•  The  Hilbert  transform  of  a  message  corresponds  to  imposing  a  phase  lag  of  90°  at  all  (positive) 
frequencies. 

•  The  message  can  be  recovered  from  an  SSB  signal  using  synchronous  downconversion,  but 
this  is  vulnerable  to  carrier  synchronization  errors.  Another  option  is  to  add  a  strong  carrier 
component  at  the  transmitter  and  to  employ  envelope  detection. 

•  Vestigial  sideband  modulation  (VSB)  may  be  viewed  as  a  generalization  of  SSB,  in  which  a  DSB 
signal  is  filtered  so  as  to  let  through  one  sideband  and  a  vestige  of  the  other.  For  an  appropriately 
designed  VSB  filter,  the  I  component  of  the  VSB  signal  is  the  message,  while  the  Q  component 
is  a  filtered  version  of  the  message  whose  form  depends  on  the  VSB  filter.  Demodulation  can  be 
performed  as  for  an  SSB  signal. 

•  Quadrature  amplitude  modulation  (QAM)  corresponds  to  sending  different  messages  on  both 
the  I  and  Q  components  of  a  passband  signal.  It  requires  synchronous  demodulation.  QAM  is  a 
popular  design  approach  for  digital  communication,  and  its  use  for  this  purpose  is  discussed  in 
Chapter  4  and  beyond. 

Angle  modulation 

•  Angle  modulation  refers  to  the  baseband  message  being  embedded  in  the  frequency /phase  of 
a  passband  waveform. 

•  Frequency  modulation  may  be  interpreted  as  phase  modulation  with  the  integral  of  the  message. 
This  smoothing  operation  leads  to  better  spectral  properties,  hence  FM  is  preferred  for  analog 
communication,  where  the  message  waveform  is  given  by  nature.  However,  phase  modulation 
is  often  used  for  digital  communication,  where  the  designer  can  carefully  shape  the  transmitted 
waveforms. 

•  The  bandwidth  of  an  FM  waveform  depends  on  both  the  maximum  frequency  deviation  and 
the  message  bandwidth,  and  can  be  approximated  using  Carson’s  formula. 

•  For  periodic  messages,  the  complex  envelope  of  an  FM  waveform  is  periodic.  Fourier  series 
can  therefore  be  used  to  characterize  the  spectrum  of  the  complex  envelope,  and  hence  the  FM 
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waveform. 

•  A  simple-to-understand,  suboptimal  demodulator  for  FM  is  the  limiter-discriminator,  which 
consists  of  differentiation  followed  by  envelope  detection.  However,  feedback-based  techniques 
such  as  the  PLL  provide  superior  performance  for  analog  communication,  while  demodulation 
techniques  exploiting  the  structure  of  the  message  are  preferred  for  digital  communication. 


Superheterodyne  receiver 

•  The  superhet  receiver  achieves  downconversion  in  multiple  stages,  mixing  the  passband  received 
waveform  at  RF  down  to  another  passband  waveform  at  IF  by  beating  it  against  a  local  oscillator 
offset  from  the  desired  carrier  frequency  by  fIF.  This  is  followed  by  demodulation  to  baseband 
by  any  of  a  variety  of  techniques,  including  coherent  downconversion  and  envelope  detection. 

•  For  operation  over  multiple  bands,  the  RF  filter  and  the  LO  are  tunable.  The  specifications  on 
the  RF,  or  image  reject,  filter  can  be  fairly  relaxed,  since  its  key  function  is  to  reject  the  image 
frequency  (separated  from  the  desired  band  by  2 fiF). 

•  The  image  reject  filter  typically  lets  in  bands  adjacent  to  the  desired  frequency,  hence  the  IF 
filter  must  have  sharp  cutoffs  in  order  to  suppress  “adjacent  channel  interference”  from  these 
bands.  The  fact  that  the  IF  is  fixed  facilitates  designing  to  such  tight  specifications. 

Phase  locked  loop 

•  The  PLL  enables  phase/ frequency  tracking  of  a  desired  passband  waveform  using  feedback. 
The  phase  difference  between  the  desired  waveform  and  a  locally  generated  copy,  obtained  using 
a  phase  detector,  is  filtered  using  a  loop  filter,  and  is  fed  back  to  drive  the  VCO  generating  the 
local  copy. 

•  The  PLL  can  be  used  for  a  variety  of  functions,  including  FM  demodulation  and  frequency 
synthesis  (the  latter  is  its  most  important  application  in  the  digital  age). 

•  Classically,  phase  detectors  were  implemented  using  mixers,  but  modern  implementations  often 
use  mixed  signal  (digital  and  analog)  approaches. 

•  The  phase  detector  is  nonlinear,  but  PLLs  are  often  analyzed  using  linearized  models  charac¬ 
terized  by  the  transfer  function  between  the  input  phase  and  the  output  phase.  The  order  of  the 
PLL  is  the  degree  of  the  denominator  in  the  Laplace  transform  of  the  transfer  function. 

•  First  order  PLLs  (trivial  loop  filter)  can  track  phase  jumps  but  not  frequency  jumps.  Second 
order  PLLs  (loop  filter  provides  proportional  plus  integral  feedback)  can  track  both  phase  and 
frequency  jumps. 


3.8  Endnotes 

A  towering  figure  in  the  history  of  analog  communication  techniques  is  Edwin  Howard  Arm¬ 
strong,  who  invented  the  regenerative  circuit  (for  feedback-based  amplification) ,  the  superhet 
receiver,  and  FM.  Mentioning  Armstrong  gives  us  the  opportunity  for  a  quick  discussion  on 
the  evolution  of  design  philosophy  from  analog  to  digital  communication.  From  what  we  know 
about  Armstrong,  he  was  a  clear  thinker  who  would  use  systematic  experimentation  rather  than 
trial  and  error,  along  with  physical  intuition,  to  arrive  at  his  inventions.  However,  he  distrusted 
results  obtained  only  using  mathematics  because  of  the  potential  for  hidden,  and  potentially 
flawed,  assumptions  in  mathematical  models.  While  some  amount  of  skepticism  of  this  nature 
is  warranted,  it  is  worth  noting  that  digital  communication  would  not  exist  today  if  it  were  not 
for  mathematical  abstractions  of  the  physical  world.  Indeed,  as  mentioned  in  Chapter  7,  Claude 
Shannon  created  information  theory  as  a  mathematical  framework  promising  the  existence  of 
reliable  and  efficient  communication  systems  in  1948,  but  it  took  many  decades  of  effort  by 
communication  system  designers  to  build  practical  systems  approaching  information-theoretic 
limits.  Shannon’s  promise,  based  purely  on  idealized  mathematical  models  (which  Armstrong 
would  perhaps  not  have  approved  of)  was  essential  in  motivating  this  effort.  Furthermore,  as 
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we  gain  more  confidence  in  the  accuracy  of  our  mathematical  models,  they  play  a  bigger  role  in 
design,  as  we  shall  see  when  we  transition  from  the  piecemeal  ideas  in  this  chapter  to  the  more 
systematic  framework  for  digital  communication  in  forthcoming  chapters.  Specifically,  the  digital 
communication  system  designer  today  employs  sophisticated  and  accurate  mathematical  models 
for  communication  channels  (developed  based  on  Armstrong’s  blend  of  physical  intuition  and 
experimentation)  to  establish  systematic  principles  for  practical  transceiver  design  and  imple¬ 
mentation  that  approach  the  performance  limits  promised  by  Shannon’s  theoretical  framework. 

Since  our  treatment  of  analog  communication  techniques  here  emphasizes  those  that  remain 
relevant  in  the  digital  age,  we  refer  readers  interested  in  a  deeper  look  at  analog  communication 
techniques  to  the  excellent  treatment  in  Ziemer  and  Tranter  [4],  For  classic  treatments  of  the 
PLL,  see  Gardner  [21]  and  Viterbi  [22]  (the  latter  provides  analysis  for  a  nonlinear  PLL  model). 
More  recent  books  include  those  by  Best  [23]  and  Razavi  [24], 


3.9  Problems 


Amplitude  modulation 


Figure  3.45:  Amplitude  modulated  signal  for  Problem  3.1. 


Problem  3.1  Figure  3.45  shows  a  signal  obtained  after  amplitude  modulation  by  a  sinusoidal 
message.  The  carrier  frequency  is  difficult,  to  determine  from  the  figure,  and  is  not  needed  for 
answering  the  questions  below. 

(a)  Find  the  modulation  index. 

(b)  Find  the  signal  power. 

(c)  Find  the  bandwidth  of  the  AM  signal. 


Problem  3.2  Consider  a  message  signal  m(t )  =  2  cos  (27r t  + 

(a)  Sketch  the  spectrum  £/(/)  of  the  DSB-SC  signal  up(t)  =  8 m(t)  cos4007t£.  What  is  the  power 
of  u! 

(b)  Carefully  sketch  the  output  of  an  ideal  envelope  detector  with  input  up.  On  the  same  plot, 
sketch  the  message  signal  m{t). 
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(c)  Let  vp(t)  denote  the  waveform  obtained  by  high-pass  filtering  the  signal  u{t)  so  as  to  let 
through  only  frequencies  above  200  Hz.  Find  vc(t)  and  vs(t)  such  that  we  can  write 

vp(t)  =  vc(t)  cos4007rt  —  vs(t)  sin 4007rt 

and  sketch  the  envelope  of  v. 

Problem  3.3  A  message  to  be  transmitted  using  AM  is  given  by 

m(t)  =  3  cos  27 Tt  +  4  sin  Qnt 

where  the  unit  of  time  is  milliseconds.  It  is  to  be  sent  using  a  carrier  frequency  of  600  KHz. 

(a)  What  is  the  message  bandwidth?  Sketch  its  magnitude  spectrum,  clearly  specifying  the  units 
used  on  the  frequency  axis. 

(b)  Find  an  expression  for  the  normalized  message  mn(t). 

(c)  For  a  modulation  index  of  50%,  write  an  explicit,  time  domain  expression  for  the  AM  signal. 

(d)  What  is  the  power  efficiency  of  the  AM  signal? 

(e)  Sketch  the  magnitude  spectrum  for  the  AM  signal,  again  clearly  specifying  the  units  used  on 
the  frequency  axis. 

(f)  The  AM  signal  is  to  be  detected  using  an  envelope  detector  (as  shown  in  Figure  3.8),  with 
R  =  50  ohms.  What  is  a  good  range  of  choices  for  the  capacitance  Cl 

Problem  3.4  Consider  a  message  signal  m(t)  =  cos(27t fmt  +  </>),  and  a  corresponding  DSB-SC 
signal  up(t)  =  Am(t)  cos27t fct,  where  fc  >  fm. 

(a)  Sketch  the  spectra  of  the  corresponding  LSB  and  USB  signals  (if  the  spectrum  is  complex¬ 
valued,  sketch  the  real  and  imaginary  parts  separately). 

(b)  Find  explicit  time  domain  expressions  for  the  LSB  and  USB  signals. 

Problem  3.5  One  way  of  avoiding  the  use  of  a  mixer  in  generating  AM  is  to  pass  x(t)  = 
m(t)  +  a  cos2n fct  through  a  memoryless  nonlinearity  and  then  a  bandpass  Liter. 

(a)  Suppose  that  M(f)  =  (1  —  |/|/10)/[_i0,io]  (the  unit  of  frequency  is  in  KHz)  and  fc  is  900 
KHz.  For  a  nonlinearity  f(x)  =  (3x 2  +  x,  sketch  the  magnitude  spectrum  at  the  output  of  the 
nonlinearity  when  the  input  is  x(t),  carefully  labeling  the  frequency  axis. 

(b)  For  the  specific  settings  in  (a),  characterize  the  bandpass  Liter  that  you  should  use  at  the 
output  of  the  nonlinearity  so  as  to  generate  an  AM  signal  carrying  the  message  m(t)l  That  is, 
describe  the  set  of  the  frequencies  that  the  BPF  must  reject,  and  those  that  it  must  pass. 


Problem  3.6  Consider  a  DSB  signal  corresponding  to  the  message  m(t)  =  sinc(2t)  and  a  carrier 
frequency  fc  which  is  100  times  larger  than  the  message  bandwidth,  where  the  unit  of  time  is 
milliseconds. 

(a)  Sketch  the  magnitude  spectrum  of  the  DSB  signal  10m(t)  cos27t fct,  specifying  the  units  on 
the  frequency  axis. 

(b)  Specify  a  time  domain  expression  for  the  corresponding  LSB  signal. 

(c)  Now,  suppose  that  the  DSB  signal  is  passed  through  a  bandpass  Liter  whose  transfer  function 
is  given  by 


HP(f) 


(f  -  fc  +  t)/[/c-i,/=+3]  +  hfc+hu+%)  ’ 


/>  o 


Sketch  the  magnitude  spectrum  of  the  corresponding  VSB  signal, 

(d)  Find  a  time  domain  expression  for  the  VSB  signal  of  the  form 


uc(t)  cos  27T  fct  —  us(t)  sin  2nfct 


carefully  specifying  uc  and  us.  the  I  and  Q  components. 
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Figure  3.46:  Block  diagram  of  Weaver’s  SSB  modulator  for  Problem  3.7. 


Problem  3.7  Figure  3.46  shows  a  block  diagram  of  Weaver’s  SSB  modulator,  which  works  if  we 
choose  fi,  fi  and  the  bandwidth  of  the  lowpass  filter  appropriately.  Let  us  work  through  these 
choices  for  a  waveform  of  the  form  m(t )  =  Al  cos(27r/^t  +  4>l)  +  Ah  cos(2-ir  fat  +  4>h),  where 
fn  >  /l  (the  design  choices  we  obtain  will  work  for  any  message  whose  spectrum  lies  in  the  band 
[ h ,  fn}- 

(a)  For  /i  =  (fi  +  fn)/ 2  (i.e.,  choosing  the  first  LO  frequency  to  be  in  the  middle  of  the  message 
band),  find  the  time  domain  waveforms  at  the  outputs  of  the  upper  and  lower  branches  after  the 
first  mixer. 

(b)  Choose  the  bandwidth  of  the  lowpass  filter  to  be  W  =  fH+^L  (assume  the  lowpass  filter  is 
ideal).  Find  the  time  domain  waveforms  at  the  outputs  of  the  upper  and  lower  branches  after 
the  LPF. 

(c)  Now,  assuming  that  /2  /#,  find  a  time  domain  expression  for  the  output  waveform,  as¬ 

suming  that  the  upper  and  lower  branches  are  added  together.  Is  this  an  LSB  or  USB  waveform? 
What  is  the  carrier  frequency? 

(d)  Repeat  (c)  when  the  lower  branch  is  subtracted  from  the  upper  branch. 

Remark:  Weaver’s  modulator  does  not  require  bandpass  Liters  with  sharp  cutoffs,  unlike  the 
direct  approach  to  generating  SSB  waveforms  by  filtering  DSB-SC  waveforms.  It  is  also  simpler 
than  the  Hilbert  transform  method  (the  latter  requires  implementation  of  a  7t/2  phase  shift  over 
the  entire  message  band). 


Figure  3.47:  Bandpass  Liter  for  Problem  3.8. 


Problem  3.8  Consider  the  AM  signal  up(t)  =  2(10  +  cos  27r/mf)  cos  27r/cf ,  where  the  message 
frequency  fm  is  1  MHz  and  the  carrier  frequency  fc  is  885  MHz. 

(a)  Suppose  that  we  use  superheterodyne  reception  with  an  IF  of  10.7  MHz,  and  envelope  detec¬ 
tion  after  the  IF  Liter.  Envelope  detection  is  accomplished  as  in  Figure  3.8,  using  a  diode  and 
an  RC  circuit.  What  would  be  a  good  choice  of  C  if  R  =  100  ohms? 

(b)  The  AM  signal  up(t)  is  passed  through  the  bandpass  Liter  with  transfer  function  Hp(f )  de¬ 
picted  (for  positive  frequencies)  in  Figure  3.47.  Find  the  I  and  Q  components  of  the  Liter  output 
with  respect  to  reference  frequency  fc  of  885  MHz.  Does  the  Liter  output  represent  a  form  of 
modulation  you  are  familiar  with? 
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Problem  3.9  Consider  a  message  signal  m(t)  with  spectrum  M(f)  =  7[-2.2](/)- 

(a)  Sketch  the  spectrum  of  the  DSB-SC  signal  udsb-sc  —  10m(i)  cos3007rf.  What  is  the  power 
and  bandwidth  of  u! 

(b)  The  signal  in  (a)  is  passed  through  an  envelope  detector.  Sketch  the  output,  and  comment 
on  how  it  is  related  to  the  message. 

(c)  What  is  the  smallest  value  of  A  such  that  the  message  can  be  recovered  without  distortion 
from  the  AM  signal  uam  =  (A  +  m(t))  cos3007rf  by  envelope  detection? 

(d)  Give  a  time-domain  expression  of  the  form 

up(t)  =  uc(t )  cos  3007rf  —  us{t)  sin  3007rt 

obtained  by  high-pass  filtering  the  DSB  signal  in  (a)  so  as  to  let  through  only  frequencies  above 
150  Hz. 

(e)  Consider  a  VSB  signal  constructed  by  passing  the  signal  in  (a)  through  a  passband  filter  with 
transfer  function  for  positive  frequencies  specified  by: 

u  (n-S  f~U9  149  </<  151 

p[J>  ~  \  2  /  >  151 

(you  should  be  able  to  sketch  Hp(f)  for  both  positive  and  negative  frequencies.)  Find  a  time 
domain  expression  for  the  VSB  signal  of  the  form 

up{t)  =  uc(t)  cos  3007rf  —  us(t)  sin  3007rt 


Problem  3.10  Consider  Figure  3.17  depicting  VSB  spectra.  Suppose  that  the  passband  VSB 
filter  Hp(f )  is  specified  (for  positive  frequencies)  as  follows: 

[  1,  101  </  <  102 

Hp(f)  =  l  |  (/  -  99) ,  99  <  /  <  101 
[  0,  else 

(a)  Sketch  the  passband  transfer  function  Hp(f )  for  both  positive  and  negative  frequencies. 

(b)  Sketch  the  spectrum  of  the  complex  envelope  H(f),  taking  fc  =  100  as  a  reference. 

(c)  Sketch  the  spectra  (show  the  real  and  imaginary  parts  separately)  of  the  I  and  Q  components 
of  the  impulse  response  of  the  passband  filter. 

(d)  Consider  a  message  signal  of  the  form  m(t)  =  4sinc4f  —  2  cos  2nt.  Sketch  the  spectrum  of  the 
DSB  signal  that  results  when  the  message  is  modulated  by  a  carrier  at  fc  =  100. 

(e)  Now,  suppose  that  the  DSB  signal  in  (d)  is  passed  through  the  VSB  filter  in  (a)-(c).  Sketch  the 
spectra  of  the  I  and  Q  components  of  the  resulting  VSB  signal,  showing  the  real  and  imaginary 
parts  separately. 

(f)  Find  a  time  domain  expression  for  the  Q  component. 


Problem  3.11  Consider  the  periodic  signal  m{t)  =  Y^=-ooP^  ~  2n),  where  p{t)  =  f/[_iii](f). 

(a)  Sketch  the  AM  signal  x{t)  =  (4  +  m(t))  cos  1007rt. 

(b)  What  is  the  power  efficiency? 


Problem  3.12  Find  an  explicit  time  domain  expression  for  the  Hilbert  transform  of  m(t )  = 
sinc(2t). 


141 


Superheterodyne  reception 

Problem  3.13  A  dual  band  radio  operates  at  900  MHz  and  1.8  GHz.  The  channel  spacing  in 
each  band  is  1  MHz.  We  wish  to  design  a  superheterodyne  receiver  with  an  IF  of  250  MHz.  The 
LO  is  built  using  a  frequency  synthesizer  that  is  tunable  from  1.9  to  2.25  GHz,  and  frequency 
divider  circuits  if  needed  (assume  that  you  can  only  implement  frequency  division  by  an  integer). 

(a)  How  would  you  design  a  superhet  receiver  to  receive  a  passband  signal  restricted  to  the  band 
1800-1801  MHz?  Specify  the  characteristics  of  the  RF  and  IF  filters,  and  how  you  would  choose 
and  synthesize  the  LO  frequency. 

(b)  Repeat  (a)  when  the  signal  to  be  received  lies  in  the  band  900-901  MHz. 


Angle  modulation 
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Figure  3.48:  Phase  deviation  of  FM  signal  for  Problem  3.14. 


Problem  3.14  Figure  3.48  shows,  as  a  function  of  time,  the  phase  deviation  of  a  bandpass  FM 
signal  modulated  by  a  sinusoidal  message. 

(a)  Find  the  modulation  index  (assume  that  it  is  an  integer  multiple  of  n  for  your  estimate). 

(b)  Find  the  message  bandwidth. 

(c)  Estimate  the  bandwidth  of  the  FM  signal  using  Carson’s  formula. 


Problem  3.15  The  input  m(t)  to  an  FM  modulator  with  kf  =  1  has  Fourier  transform 


M(f)  = 


j2nf  I/I  <  1 

0  else 


The  output  of  the  FM  modulator  is  given  by 

u(t)  =  Acos(2nfct  +  4>(t)) 

where  fc  is  the  carrier  frequency. 

(a)  Find  an  explicit  time  domain  expression  for  and  carefully  sketch  as  a  function  of 
time. 

(b)  Find  the  magnitude  of  the  instantaneous  frequency  deviation  from  the  carrier  at  time  t  — 

(c)  Using  the  result  from  (b)  as  an  approximation  for  the  maximum  frequency  deviation,  estimate 
the  bandwidth  of  u{t). 
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Problem  3.16  Let  p(t)  =  1)_  1  ij(t)  denote  a  rectangular  pulse  of  unit  duration.  Construct  the 
signal 

OO 

m{t )  =  (-1)X^  - 

n=—oo 

The  signal  m(t )  is  input  to  an  FM  modulator,  whose  output  is  given  by 

u(t)  =  20  cos(27r/ct  +  0(f)) 


where 

0(f)  =  20n  /  m{r)dT  +  a 

J  —  OO 

and  a  is  chosen  such  that  0(0)  =  0. 

(a)  Carefully  sketch  both  m(t)  and  0(f)  as  a  function  of  time. 

(b)  Approximating  the  bandwidth  of  m(t)  as  W  ~  2,  estimate  the  bandwidth  of  u(t)  using 
Carson’s  formula. 

(c)  Suppose  that  a  very  narrow  ideal  BPF  (with  bandwidth  less  than  0.1)  is  placed  at  fc  +  a. 
For  which  (if  any)  of  the  following  choices  of  a  will  you  get  nonzero  power  at  the  output  of  the 
BPF:  (i)  a  =  .5,  (ii)  a  =  .75,  (iii)  a  =  1. 

Problem  3.17  Let  u(t)  =  20  cos(20007rf  +  0(f))  denote  an  angle  modulated  signal. 

(a)  For  0(f)  =  0.1cos27rf,  what  is  the  approximate  bandwidth  of  ul 

(b)  Let  yit)  =  u12(t).  Specify  the  frequency  bands  spanned  by  y(t).  In  particular,  specify  the 
output  when  y  is  passed  through: 

(i)  A  BPF  centered  at  12KHz.  Using  Carson’s  formula,  determine  the  bandwidth  of  the  BPF 
required  to  recover  most  of  the  information  in  (f)  from  the  output. 

(ii)  An  ideal  LPF  of  bandwidth  200  Hz. 

(iii)  A  BPF  of  bandwidth  100  Hz  centered  at  11  KHz. 

(c)  For  <i>(t)  =  2  s(t  -  2 n),  where  s(t)  =  (1  -  |t|)/[_U]. 

(i)  Sketch  the  instantaneous  frequency  deviation  from  the  carrier  frequency  of  1  KHz. 

(ii)  Show  that  we  can  write 

cos(20007rt  +  nat) 

n 

Specify  a,  and  write  down  an  explicit  integral  expression  for  cn. 


Problem  3.18  Consider  the  set-up  of  Problem  3.16,  taking  the  unit  of  time  in  milliseconds  for 
concreteness.  You  do  not  need  the  value  of  fc,  but  you  can  take  it  to  be  1  MHz. 

(a)  Numerically  (e.g.,  using  Matlab)  compute  the  Fourier  series  expansion  for  the  complex  enve¬ 
lope  of  the  FM  waveform,  in  the  same  manner  as  was  done  for  a  sinusoidal  message.  Report  the 
magnitudes  of  the  Fourier  series  coefficients  for  the  first  5  harmonics. 

(b)  Find  the  90%,  95%  and  99%  power  containment  bandwidths.  Compare  with  the  estimate 
from  Carson’s  formula  obtained  in  Problem  3.16(b). 


Problem  3.19  A  VCO  with  a  quiescent  frequency  of  1  GHz,  with  a  frequency  sweep  of  2 
MHz/mV  produces  an  angle  modulated  signal  whose  phase  deviation  6(t)  from  a  carrier  frequency 
fc  of  1  GHz  is  shown  in  Figure  3.49. 

(a)  Sketch  the  input  m(t )  to  the  VCO,  carefully  labeling  both  the  voltage  and  time  axes. 

(b)  Estimate  the  bandwidth  of  the  angle  modulated  signal  at  the  VCO  output.  You  may  ap¬ 
proximate  the  bandwidth  of  a  periodic  signal  by  that  of  its  first  harmonic. 
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m(t) 


vco 

2  MHz/mV 


cos(2jtfct  +  0  (t)) 


0(t) 

* 


t(microseconds) 


Figure  3.49:  Set-up  for  Problem  3.19. 


Uncategorized  problems 

Problem  3.20  The  signal  m(t)  =  2cos207r t  —  cos407rt,  where  the  unit  of  time  is  millisec¬ 
onds,  and  the  unit  of  amplitude  is  millivolts  (mV),  is  fed  to  a  VCO  with  quiescent  frequency 
of  5  MHz  and  frequency  deviation  of  100  KHz/rnV.  Denote  the  output  of  the  VCO  by  y(t). 

(a)  Provide  an  estimate  of  the  bandwidth  of  y. 

(b)  The  signal  y{t)  is  passed  through  an  ideal  bandpass  filter  of  bandwidth  5  KHz,  centered  at 
5.005  MHz.  Describe  in  detail  how  you  would  compute  the  power  at  the  filter  output  (if  you  can 
compute  the  power  in  closed  form,  do  so). 


m(t) 


•  • 

10  mV 

0 

1 

2 

3 

•• 

-10  mV 


t  (ms) 


Figure  3.50:  Message  signal  for  Problems  3.21  and  3.22. 


Lowpass 

Filter 


uc(t) 


Lowpass 

Filter 


us(t) 


Figure  3.51:  Downconversion  using  201  KHz  LO  (t  in  ms  in  the  figure)  for  Problem  3.21(b)-(c). 


Problem  3.21  Consider  the  AM  signal  up(t)  =  (A  +  m(t))  cos4007rt  (t  in  ms)  with  message 
signal  m(t)  as  in  Figure  3.50,  where  A  is  10  mV. 

(a)  If  the  AM  signal  is  demodulated  using  an  envelope  detector  with  an  RC  filter,  how  should 
you  choose  C  if  R  —  500  ohms?  Try  to  ensure  that  the  first  harmonic  (i.e.,  the  fundamental) 
and  the  third  harmonic  of  the  message  are  reproduced  with  minimal  distortion. 

(b)  Now,  consider  an  attempt  at  synchronous  demodulation,  where  the  AM  signal  is  downcon- 
verted  using  a  201  KHz  LO,  as  shown  in  Figure  3.51,  find  and  sketch  the  I  and  Q  components, 
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uc(t)  and  us(t),  for  0  <  t  <  2  (t  in  ms). 

(c)  Describe  how  you  would  recover  the  original  message  m(t)  from  the  downconverter  outputs 
uc(t)  and  us(t),  drawing  block  diagrams  as  needed. 


Problem  3.22  The  square  wave  message  signal  m(t)  in  Figure  3.50  is  input  to  a  VCO  with 
quiescent  frequency  200  KHz  and  frequency  deviation  1  KHz/mV.  Denote  the  output  of  the 
VCO  by  up(t). 

(a)  Sketch  the  I  and  Q  components  of  the  FM  signal  (with  respect  to  a  frequency  reference  of 
200  KHz  and  a  phase  reference  chosen  such  that  the  phase  is  zero  at  time  zero)  over  the  time 
interval  0  <  t  <  2  (t  in  ms),  clearly  labeling  the  axes. 

(b)  In  order  to  extract  the  I  and  Q  components  using  a  standard  downconverter  (mix  with  LO 
and  then  lowpass  filter),  how  would  you  choose  the  bandwidth  of  the  LPFs  used  at  the  mixer 
outputs? 


Figure  3.52:  Phase  Evolution  in  Problem  3.23. 


Problem  3.23  The  output  of  an  FM  modulator  is  the  bandpass  signal  y(t)  =  10cos(3007rt  + 
0(i)),  where  the  unit  of  time  is  milliseconds,  and  the  phase  (f>(t)  is  as  sketched  in  Figure  3.52. 

(a)  Suppose  that  y(t)  is  the  output  of  a  VCO  with  frequency  deviation  1  KHz/mV  and  quiescent 
frequency  149  KHz,  find  and  sketch  the  input  to  the  VCO. 

(b)  Use  Carson’s  formula  to  estimate  the  bandwidth  of  y(t),  clearly  stating  the  approximations 
that  you  make. 


Phase  locked  loop 

Set-up  for  PLL  problems:  For  the  next  few  problems  on  PLL  modeling  and  analysis,  consider 
the  linearized  model  in  Figure  3.38,  with  the  following  notation:  loop  filter  G(s),  loop  gain  K, 
and  VCO  modeled  as  1/s.  Recall  from  your  background  on  signals  and  systems  that  a  second 
order  system  of  the  form  ^,,+9(V  s+^L,  is  said  to  have  natural  frequency  ojn  (in  radians/second) 

and  damping  factor 


Problem  3.24  Let  H(s)  denote  the  gain  from  the  PLL  input  to  the  output  of  the  VCO.  Let 
He(s)  denote  the  gain  from  the  PLL  input  to  the  input  to  the  loop  filter.  Let  Hm(s )  denote  the 
gain  from  the  PLL  input  to  the  VCO  input. 

(a)  Write  down  the  formulas  for  H(s),  He(s ),  Hm(s),  in  terms  of  K  and  G(s). 

(b)  Which  is  the  relevant  transfer  function  if  the  PLL  is  being  used  for  FM  demodulation? 

(c)  Which  is  the  relevant  transfer  function  if  the  PLL  is  being  used  for  carrier  phase  tracking? 

(d)  For  G(s)  =  —  and  K  =  2,  write  down  expressions  for  H(s),  He(s)  and  Hm(s).  What  is  the 
natural  frequency  and  the  damping  factor? 
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Problem  3.25  Suppose  the  PLL  input  exhibits  a  frequency  jump  of  1  KHz. 

(a)  How  would  you  choose  the  loop  gain  K  for  a  first  order  PLL  (G(s)  =  1)  to  ensure  a  steady 
state  error  of  at  most  5  degrees? 

(b)  How  would  you  choose  the  parameters  a  and  K  for  a  second  order  PLL  (G(s)  =  to  have 
a  natural  frequency  of  1.414  KHz  and  a  damping  factor  of  Specify  the  units  for  a  and  K. 

(c)  For  the  parameter  choices  in  (b),  find  and  roughly  sketch  the  phase  error  as  a  function  of 
time  for  a  frequency  jump  of  1  KHz. 


Problem  3.26  Suppose  that  G(s )  =  and  x  =  4. 

(a)  Find  the  transfer  function  ^44- 

(b)  Suppose  that  the  PLL  is  used  for  FM  demodulation,  with  the  input  to  the  PLL  is  being  an 

FM  signal  with  instantaneous  frequency  deviation  of  the  FM  signal  where  the  message 

m(t)  =  2  cos  t  +  sin  2t.  Using  the  linearized  model  for  the  PLL,  find  a  time  domain  expression 
for  the  estimated  message  provided  by  the  PLL-based  demodulator. 

Hint:  What  happens  to  a  sinusoid  of  frequency  uj  passing  through  a  linear  system  with  transfer 
function  H(s)7 


Figure  3.53:  System  for  Problem  3.27. 


Problem  3.27  Consider  the  PLL  depicted  in  Figure  3.53,  with  input  phase  4>(t).  The  output 
signal  of  interest  to  us  here  is  v(t),  the  VCO  input.  The  parameter  for  the  loop  filter  G(s)  is 
given  by  a  =  10007T  radians/sec. 

(a)  Assume  that  the  PLL  is  locked  at  time  0,  and  suppose  that  =  10007rt/{t>0}.  Find  the 
limiting  value  of  v(t). 

(b)  Now,  suppose  that  =  47rsin  10007rt.  Find  an  approximate  expression  for  v(t).  For  full 
credit,  simplify  as  much  as  possible. 

(c)  For  part  (b),  estimate  the  bandwidth  of  the  passband  signal  at  the  PLL  input. 


Quiz  on  analog  communication  systems 

Problem  3.28  Answer  the  following  questions  regarding  commercial  analog  communication  sys¬ 
tems  (some  of  which  may  no  longer  exist  in  your  neighborhood). 

(a)  (True  or  False)  The  modulation  format  for  analog  cellular  telephony  was  conventional  AM. 

(b)  (Multiple  choice)  FM  was  used  in  analog  TV  as  follows: 

(i)  to  modulate  the  video  signal 

(ii)  to  modulate  the  audio  signal 

(iii)  FM  was  not  used  in  analog  TV  systems. 

(c)  A  superheterodyne  receiver  for  AM  radio  employs  an  intermediate  frequency  (IF)  of  455  KHz, 
and  has  stations  spaced  at  10  KHz.  Comment  briefly  on  each  of  the  following  statements: 
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(i)  The  AM  band  is  small  enough  that  the  problem  of  image  frequencies  does  not  occur. 

(ii)  A  bandwidth  of  20  KHz  for  the  RF  front  end  is  a  good  choice. 

(iii)  A  bandwidth  of  20  KHz  for  the  IF  filter  is  a  good  choice. 


Software  Lab  3.1:  Amplitude  modulation  and  envelope  detection 

Lab  Objectives:  The  goal  of  this  lab  to  illustrate  amplitude  modulation  and  envelope  detection 
using  digitally  modulated  messages.  In  addition  to  using  envelope  detection  to  demodulate 
conventional  amplitude  modulation,  we  also  illustrate  how  it  can  be  used  for  I/Q  downconversion 
when  quadrature  mixers  are  not  available. 

Reading:  Section  3.2  (amplitude  modulation). 

Laboratory  Assignment 

1)  Generate  a  message  signal  m{t)  using  binary  digital  modulation  with  a  sine  pulse  by  modi¬ 
fying  Code  Fragment  2.3.2  to  use  random  bits.  Set  the  symbol  time  T  to  1  millisecond.  Take 
a  waveform  segment  spanning  ns  symbols  and  take  its  Fourier  transform  by  modifying  Code 
Fragment  2.5.1  (or  using  the  code  from  Lab  1),  choosing  the  length  of  the  FFT  (and  hence  ns ) 
so  as  to  get  a  frequency  resolution  of  1  Hz.  Plot  the  magnitude  squared  of  the  Fourier  transform, 
divided  by  nsT ,  the  length  of  the  observation  interval.  This  is  an  estimate  of  the  power  spectral 
density  (PSD)  Sm(f),  which  is  formally  defined  later,  in  Chapters  4  and  5. 

2)  Repeat  the  PSD  estimation  in  1)  over  multiple  runs  and  average  the  estimates,  choosing  the 
number  of  runs  large  enough  so  as  to  get  a  smooth  estimate  of  the  PSD.  Eyeball  the  PSD  to 
estimate  the  bandwidth  of  the  signal  (the  units  should  be  consistent  with  our  assumption  of 
T  =  1  ms). 

3)  Now,  generate  the  DSB  signal  u{t)  =  m{t )  cos2ir fct,  where  fc  =  10/T.  Choose  the  sampling 
rate  for  generating  discrete  time  samples  as  4 fc.  Plot  the  DSB  signal  over  4  symbols. 

4)  Estimate  the  PSD  Su(f)  of  the  DSB  signal  generated  in  3)  by  choosing  a  large  enough  number 
of  symbols  as  in  1),  and  averaging  over  several  runs  as  in  2).  What  is  the  relationship  with  the 
PSD  obtained  in  2)? 

5)  Repeat  3)  and  4)  for  the  AM  signal  u(t)  =  ( Ac  +  m(t))  cos27r/ct,  where  fc  =  10/T  and  Ac 
is  chosen  to  have  the  smallest  possible  value  that  allows  envelope  detection.  As  before,  choose 
the  sampling  rate  for  generating  discrete  time  samples  as  4 fc.  Do  you  run  into  difficulty  when 
computing  the  PSD?  Explain. 

6)  Starting  with  an  AM  signal  as  in  5),  implement  an  envelope  detector  as  follows: 

(a)  Pass  u(t)  through  an  idealized  diode  to  obtain  u+(t)  =  w(t)/u(t)>0. 

(b)  Pass  u+{t)  through  an  RC  filter  with  impulse  response  h(t)  =  e~^cIt>Q.  You  can  use  the 
contconv  function  in  Lab  1  for  this  purpose.  Choose  the  value  of  RC  based  on  the  design  rule  of 
thumb  discussed  in  Chapter  3. 

(c)  Implement  a  DC  block  simply  by  subtracting  out  the  empirical  mean  from  the  output  of  (b). 
Plot  the  output  of  the  envelope  detector,  along  with  the  original  message  m(t). 

7)  Repeat  6)  for  different  values  of  RC  (both  too  large  and  too  small),  and  comment  on  how  the 
resulting  message  estimate  is  affected  by  the  value  of  RC. 

Envelope  detector  based  I/Q  downconversion 

We  know  from  Chapter  2  that  a  passband  signal  can  be  downconverted  to  complex  baseband  by 
mixing  with  the  cosine  and  sine  of  the  carrier  and  then  low  pass  filtering.  However,  implementing 
mixers  may  not  be  easy  at  really  high  carrier  frequencies  (e.g.,  for  coherent  optical  communi¬ 
cation).  Envelope  detection,  after  adding  strong  locally  generated  carrier  components  to  the 
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received  waveform,  provides  an  alternative  approach  for  downconversion  that  may  be  easier  to 
implement  in  such  scenarios.  Consider  the  QAM  received  signal 

v(t)  =  vc(t)  cos  2nfct  —  vs(t)  sin  2nfct  (3.40) 

where  vc,  vs  are  real  baseband  messages.  The  receiver’s  local  oscillator  generates  Ac  cos(27t  fct+9) 
and  Acsin(27r/cf+6)),  where  9  is  the  offset  between  the  carrier  reference  used  at  the  transmitter  to 
generate  the  QAM  signal  and  the  local  copy  of  the  carrier  at  the  receiver.  Instead  of  mixing  the 
local  oscillator  outputs  against  v(t),  we  add  them  to  v(t)  and  then  perform  envelope  detection, 
as  described  in  6).  That  is,  we  perform  the  following  operations: 

•  Pass  v(t)  +  Accos(2irfct  +  9)  through  an  envelope  detector  to  obtain  vc{t). 

•  Pass  v(t)  +  Ac sin(27r/cf  +  9)  through  an  envelope  detector  to  obtain  vs(t). 


8)  For  Ac  large  enough,  can  you  find  simple  relationships  between  vc(t),  vs(t)  and  the  original 
messages  vc(t),  vs(t)7  Is  there  a  simple  relationship  between  the  complex  baseband  waveforms 
v(t)  =  vc(t)  +jvs(t )  and  v(t)  =  vc(t)  +  jvs(t)l 

9)  Generate  vc  and  vs  as  in  1),  using  different  sequences  of  random  bits,  and  generate  v(t)  as 
in  (3.40),  setting  fc  =  10 /T  as  in  3).  Implement  the  preceding  envelope  detection  operations, 
first  setting  9  =  0.  Plot  vc(t)  and  vs(t)  for  Ac  “large  enough.”  Also  plot  for  reference  vc(t)  and 
vs(t).  Comment  on  whether  the  results  conform  to  your  expectations  from  7).  How  small  can 
you  make  Ac  while  still  getting  “good”  results? 

10)  Repeat  9)  for  9  = 

11)  Assuming  that  the  receiver  knows  6,  can  you  recover  estimates  of  vc(t)  and  vs(t)  from  vc(t), 
vs(t)7  Implement  these  operations  and  show  plots  of  the  estimates  and  the  original  waveforms. 

Lab  Report 

•  Answer  all  questions  and  print  out  the  most  useful  plots  to  support  your  answers. 

•  Write  a  paragraph  about  any  questions  or  confusions  that  you  may  have  experienced  with 
this  lab. 


Software  Lab  3.2:  Frequency  modulation  basics 

Lab  Objectives:  The  goal  of  this  lab  to  explore  the  characteristics  of  frequency  modulated 
signals  using  digitally  modulated  messages,  and  to  explore  demodulation  using  differentiation  in 
baseband. 

Reading:  Section  3.3  (angle  modulation). 

Laboratory  Assignment 

Consider  the  message  signal  m(t)  =  b[n]p(t  —  nT),  where  b[n\  are  chosen  from  {  —  1,  +1},  and 
T  is  the  symbol  interval.  You  can  generate  such  a  signal  by  modifying  Code  Fragment  2.3.2. 
Define  a  passband  FM  signal  modulated  by  the  message  by 

sp{t)  =  cos  (2nfct  +  9 (£)) 

where  fc  is  the  carrier  frequency,  and  the  phase 


9(t)  =  2irkf  /  m(r)dr 
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(assume  t  >  0).  The  complex  envelope  of  sp  with  respect  to  reference  2tt  fct  is  given  by  s(t)  = 
ej0(b_  We  consider  a  digital  message  signal  of  the  form  m(t)  =  b[n]p(t  —  nT ),  where  b[n]  are 

chosen  independently  and  with  equal  probability  from  {—1,  +1}. 

The  following  code  fragment  generates  the  FM  waveform  for  a  rectangular  pulse  /[o.i]- 


oversampling_f actor  =  16; 

°/„for  a  pulse  with  amplitude  one,  the  max  frequency  deviation  is  given  by  kf 
kf=4; 

/(increase  the  oversampling  factor  if  kf  (and  hence  frequency  deviation,  and  hence  bw  of  FM  s 
oversampling_f actor  =  ceil (max (kf , 1) *oversampling_f actor) ; 
ts=l/ oversampling_f  actor ;  /(sampling  time 
nsamples  =  ceil(l/ts); 

pulse  =  ones  (nsamples ,  1) ;  /(rectangular  pulse 
nsymbols  =10; 

symbols=zeros (nsymbols , 1) ; 

"/(random  symbol  sequence 

symbols  =  sign(rand (nsymbols , l)-0 . 5) ; 

/(generate  digitally  modulated  message 
nsymbols_upsampled=l+ (nsymbols-1) *nsamples ; 
symbols_upsampled=zeros (nsymbols_upsampled, 1) ; 
symbols_upsampled(l : nsamples : nsymbols_upsampled)=symbols ; 
message  =  conv (symbols_upsampled, pulse) ; 

°/„FM  signal  phase  obtained  by  integrating  the  message 
theta  =  2*pi*kf *ts*cumsum(message) ; 
cenvelope=exp(j*theta) ; 

L=length(cenvelope) ; 
time=(0 : L-l) *ts ; 

Icomponent  =  real (cenvelope) ; 

Qcomponent=  imag(cenvelope) ; 

%plot  I  component 
plot (time , Icomponent) ; 

1)  By  modifying  and  enhancing  the  preceding  code  fragment  as  needed,  plot  the  I  and  Q  compo¬ 
nents  of  the  complex  envelope  for  a  random  sequence  of  bits  as  a  function  of  time  for  kf  =  0.25. 

Also  plot  0(t)/n  versus  t.  How  big  are  the  changes  in  6{t )  corresponding  to  a  given  message 
bit  b [ri]  ?  Do  you  notice  a  pattern  in  how  the  1  and  Q  components  depend  on  the  message  bits 
{b[n\}7 

Remark:  The  special  case  of/cj  =  l/4isa  digital  modulation  scheme  known  as  Minimum  Shift 
Keying  (MSK) .  It  can  be  viewed  as  FM  modulation  using  a  digital  message,  but  the  plots  of  the  I 
and  Q  components  should  indicate  that  MSK  can  also  be  interpreted  as  the  I  and  Q  components 
each  being  amplitude  modulated  by  a  different  set  of  bits,  with  an  offset  between  the  I  and  Q 
components. 


2)  Now  redo  (1)  for  kf  =  4.  The  patterns  in  the  I  and  Q  components  are  much  harder  to  see  now. 
We  typically  do  not  use  such  wideband  FM  for  digital  modulation,  but  may  use  it  for  analog 
messages. 

3)  For  a  complex  baseband  waveform  y(t )  =  yc(t)  +  jys(t)  =  e{t)e^e^\  we  know  that 

9{t)  =  tan-1  ^7 

Veit) 
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Show  that 


(3.41) 


±g(f\  VcWM  -  ysWcjt) 

dt  u  ym  +  vKt) 

For  an  FM  signal,  the  message  can  be  estimated  as  ^—//(t),  with  the  derivative  computed  using 
a  highpass  filter.  Thus,  this  can  be  viewed  as  a  baseband  version  of  the  limiter-discriminator 
demodulator  for  FM.  It  can  be  implemented  using  the  following  code  fragment. 


°/„baseband  discriminator 

%diff erencing  operation  approximates  derivative 
Iderivative  =  [0;diff (Icomponent)] /ts; 

Qderivative  =  [0;diff (Qcomponent)] /ts; 

me s s age _ estimate  =  (l/(2*pi*kf) )* (Icomponent . *Qderivative  -  Qcomponent . *Iderivative) . /(I 


4)  Apply  the  preceding  approach  to  the  noiseless  FM  signals  generated  in  parts  1)  and  2).  Plot 
the  estimated  message  and  the  original  message  on  the  same  plot,  and  comment  on  whether  you 
are  getting  a  good  estimate. 

5)  Add  an  arbitrary  phase  to  the  complex  envelope. 

phi  =  2*pi*rand;  °/„phase  uniform  over  [0,2  pi] 
cenvelope  =  cenvelope*exp(j*phi) ; 

7„now  apply  baseband  discriminator 

Redo  4).  What  happens  to  the  estimated  message?  Are  you  still  getting  a  good  estimate  of  the 
original  message? 

6)  Now,  add  a  frequency  offset  as  well  as  a  phase  offset  to  the  complex  envelope. 

phi  =  2*pi*rand;  "/phase  uniform  over  [0,2  pi] 
df  =  0.3; 

cenvelope  =  cenvelope . *exp(j* (2*pi*df *time+phi) ) ; 

'/.now  apply  baseband  discriminator 

Redo  4).  What  happens  to  the  estimated  message?  Are  you  still  getting  a  good  estimate  of  the 
original  message?  If  you  are  not  quite  getting  the  original  message  back,  what  can  you  do  to  fix 
the  situation? 

Remark:  You  should  find  that  this  crude  differentiation  technique  does  work  for  low  noise  (we 
are  considering  zero  noise).  However,  it  is  rather  fragile  when  noise  is  inserted.  We  do  not 
explore  this  in  this  lab,  but  you  are  welcome  to  try  adding  Gaussian  noise  samples  to  the  I  and 
Q  components  and  see  how  the  discriminator  performs  for  different  values  of  noise  variance.  At 
the  very  least,  one  would  need  to  lowpass  filter  the  message  estimate  obtained  above  to  average 
out  noise,  but  it  is  far  better  to  use  feedback-based  techniques  such  as  the  PLL  for  general  FM 
demodulation,  or,  for  digital  messages,  to  use  demodulation  techniques  that  use  the  structure  of 
the  message.  We  do  not  discuss  such  techniques  in  this  lab. 

7)  We  now  explore  the  spectral  properties  of  FM.  For  the  complex  envelope  s(t)  =  compute 
the  Fourier  transform  numerically,  choosing  the  length  of  the  FFT  (and  hence  ns)  so  as  to  get 
a  frequency  resolution  of  0.1.  You  can  modify  Code  Fragment  2.5.1  or  reuse  code  from  Lab  1. 
Compute  the  power  spectral  density  (PSD),  defined  as  the  magnitude  squared  of  the  Fourier 
transform  divided  by  the  interval  over  which  you  are  computing  it,  and  then  averaged  over 
multiple  runs.  Plot  the  PSD  for  kf  =  1/4  and  kf  —  4.  You  can  modify  the  following  code 
fragment  as  needed. 
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nsymbols  =1000; 
symbols=zeros (nsymbols , 1) ; 
nruns=1000 ; 
f s=0 . 1 ; 

Nmin  =  ceil (l/(f s_desired*ts) ) ;  /(minimum  length  DFT  for  desired  frequency  granularity 
message_length=l+(nsymbols-l) *nsamples+length (pulse) -1 ; 

Nmin  =  max (message_length, Nmin) ; 

%  °/0for  efficient  computation,  choose  FFT  size  to  be  power  of  2 

Nfft  =  2" (nextpow2(Nmin) )  %FFT  size  =  the  next  power  of  2  at  least  as  big  as  Nmin 
psd=zeros (Nfft , 1) ; 
for  runs=l :nruns, 

/(random  symbol  sequence 
symbols  =  sign (rand (nsymbols, 1) -0.5) ; 
nsymbols_upsampled=l+(nsymbols-l) *nsamples ; 
symbols_upsampled=zeros (nsymbols_upsampled, 1) ; 
symbols_upsampled(l :nsamples : nsymbols_upsampled)=symbols ; 
message  =  conv(symbols_upsampled, pulse) ; 

°/„FM  signal  phase 

theta  =  2*pi*kf *ts*cumsum(message) ; 

cenvelope=exp(j*theta) ; 

time=(0 : length (cenvelope) -1) *ts ; 

%  %freq  domain  signal  computed  using  DFT 

cenvelope_f req  =  ts*fft (cenvelope , Nfft) ;  °/„FFT  of  size  Nfft,  automatically  zeropads  as  needed 
cenvelope_f  req_centered  =  fftshift  (cenvelope_f  req) ;  °/„shifts  DC  to  center  of  spectrum 
psd=psd+abs (cenvelope_f req_centered) . ~2 ; 
end 

psd=psd/ (nruns*nsymbols) ; 

fs=l/ (Nfft*ts)  %actual  frequency  resolution  attained 

%  °/0set  of  frequencies  for  which  Fourier  transform  has  been  computed  using  DFT 
freqs  =  ( (1 : Nfft) -1-Nf ft/2) *fs ; 

%plot  the  PSD 
plot (freqs ,psd) ; 

8)  Plot  the  PSD  for  kf  =  -j  and  kf  —  4.  Are  your  results  consistent  with  Carson’s  formula? 

9)  Redo  8),  replacing  the  rectangular  pulse  by  a  sine  pulse:  pit)  =  sin(7rf)  /[o,i]  (t).  Are  your  results 
consistent  with  Carson’s  formula?  Compare  the  spectrum  occupancy  in  8)  and  9),  commenting 
on  the  roles  of  kf  and  p(t). 

pulse  =transpose (sin (pi* (0:ts:l))); 

10)  Now,  let  us  increase  the  dynamic  range  of  the  message  by  replacing  the  bits  by  numbers 
drawn  from  a  Gaussian  distribution  with  the  same  variance. 

symbols  =  randn (nsymbols , 1) ; 

Compute  and  plot  the  PSD  for  a  sinusoidal  pulse,  and  compare  with  the  spectral  occupancy  with 
that  in  9). 

11)  Assuming  that  the  unit  of  time  is  1  ms,  estimate  the  bandwidth  of  the  FM  signals  whose 
PSDs  you  plotted  in  9).  You  can  either  eyeball  it,  or  estimate  the  length  of  the  interval  over 
which  95%  of  the  signal  power  is  contained. 

Lab  Report 
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•  Answer  all  questions  and  print  out  the  most  useful  plots  to  support  your  answers. 

•  Write  a  paragraph  about  any  questions  or  confusions  that  you  may  have  experienced  with 
this  lab. 
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Chapter  4 

Digital  Modulation 


...0110100... 


Symbol  interval 
T 


Figure  4.1:  Running  example:  Binary  antipodal  signaling  using  a  timelimited  pulse. 


Digital  modulation  is  the  process  of  translating  bits  to  analog  waveforms  that  can  be  sent  over 
a  physical  channel.  Figure  4.1  shows  an  example  of  a  baseband  digitally  modulated  waveform, 
where  bits  that  take  values  in  {0, 1}  are  mapped  to  symbols  in  {+1,  —1},  which  are  then  used 
to  modulate  translates  of  a  rectangular  pulse,  where  the  translation  corresponding  to  successive 
symbols  is  the  symbol  interval  T.  The  modulated  waveform  can  be  represented  as  a  sequence  of 
symbols  (taking  values  ±1  in  the  example)  multiplying  translates  of  a  pulse  (rectangular  in  the 
example).  This  is  an  example  of  a  widely  used  form  of  digital  modulation  termed  linear  modula¬ 
tion,  where  the  transmitted  signal  depends  linearly  on  the  symbols  to  be  sent.  Our  treatment  of 
linear  modulation  in  this  chapter  generalizes  this  example  in  several  ways.  The  modulated  signal 
in  Figure  4.1  is  a  baseband  signal,  but  what  if  we  are  constrained  to  use  a  passband  channel 
(e.g.,  a  wireless  cellular  system  operating  at  900  MHz)?  One  way  to  handle  this  to  simply  trans¬ 
late  this  baseband  waveform  to  passband  by  upconversion;  that  is,  send  uv{t)  =  u(t)  cos2n fct, 
where  the  carrier  frequency  fc  lies  in  the  desired  frequency  band.  However,  what  if  the  frequency 
occupancy  of  the  passband  signal  is  strictly  constrained?  (Such  constraints  are  often  the  result 
of  guidelines  from  standards  or  regulatory  bodies,  and  serve  to  limit  interference  between  users 
operating  in  adjacent  channels.)  Clearly,  the  timelimited  modulation  pulse  used  in  Figure  4.1 
spreads  out  significantly  in  frequency.  We  must  therefore  learn  to  work  with  modulation  pulses 
which  are  better  constrained  in  frequency.  We  may  also  wish  to  send  information  on  both  the 
1  and  Q  components.  Finally,  we  may  wish  to  pack  in  more  bits  per  symbol;  for  example,  we 
could  send  2  bits  per  symbol  by  using  4  levels,  say  {±1,  ±3}. 

Chapter  plan:  In  Section  4.1,  we  develop  an  understanding  of  the  structure  of  linearly  mod¬ 
ulated  signals,  using  the  binary  modulation  in  Figure  4.1  to  lead  into  variants  of  this  example, 
corresponding  to  different  signaling  constellations  which  can  be  used  for  baseband  and  passband 
channels.  In  Section  4.2,  we  discuss  how  to  quantify  the  bandwidth  of  linearly  modulated  signals 
by  computing  the  power  spectral  density.  With  these  basic  insights  in  place,  we  turn  in  Section 
4.3  to  a  discussion  of  modulation  for  bandlimited  channels,  treating  signaling  over  baseband  and 
passband  channels  in  a  unified  framework  using  the  complex  baseband  representation.  We  note, 
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invoking  Nyquist’s  sampling  theorem  to  determine  the  degrees  of  freedom  offered  by  bandlimited 
channels,  that  linear  modulation  with  a  bandlimited  modulation  pulse  can  be  used  to  fill  all  of 
these  degrees  of  freedom.  We  discuss  how  to  design  bandlimited  modulation  pulses  based  on  the 
Nyquist  criterion  for  intersymbol  interference  (ISI)  avoidance.  These  concepts  are  reinforced  by 
Software  Lab  8.1,  which  provides  a  hands-on  demonstration  of  Nyquist  signaling  in  the  absence 
of  noise.  Finally,  we  discuss  orthogonal  and  biorthogonal  modulation  in  Section  4.4. 

Software:  Over  the  course  of  this  and  later  chapters,  we  develop  a  simulation  framework  for 
simulating  linear  modulation  over  noisy  dispersive  channels.  Software  Lab  4.1  in  this  chapter  is 
a  first  step  in  this  direction.  Appendix  4.B  provides  guidance  for  developing  the  software  for  this 
lab. 


4.1  Signal  Constellations 


Figure  4.2:  BPSK  illustrated  for  fc  —  T  and  symbol  sequence  +1,  —1,  —1.  The  solid  line  corre¬ 
sponds  to  the  passband  signal  up{t),  and  the  dashed  line  to  the  baseband  signal  u{t).  Note  that, 
due  to  the  change  in  sign  between  the  first  and  second  symbols,  there  is  a  phase  discontinuity  of 
7r  at  t  —  T. 


The  linearly  modulated  signal  depicted  in  Figure  4.1  can  be  written  in  the  following  general 
form: 

n(f)  =  ^  b[n]p(t  —  nT)  (4.1) 

n 

where  {fe[n]}  is  a  sequence  of  symbols,  and  p(t )  is  the  modulating  pulse.  The  symbols  take  values 
in  {—1,  +1}  in  our  example,  and  the  modulating  pulse  is  a  rectangular  timclimited  pulse.  As  we 
proceed  along  this  chapter,  we  shall  see  that  linear  modulation  as  in  (4.1)  is  far  more  generally 
applicable,  in  terms  of  the  set  of  possible  values  taken  by  the  symbol  sequence,  as  well  as  the 
choice  of  modulating  pulse. 

The  modulated  waveform  (4.1)  is  a  baseband  waveform.  While  it  is  timclimited  in  our  example, 
and  hence  cannot  be  strictly  bandlimited,  it  is  approximately  bandlimited  to  a  band  around  DC. 
Now,  if  we  are  given  a  passband  channel  over  which  to  send  the  information  encoded  in  this 
waveform,  one  easy  approach  is  to  send  the  passband  signal 

up(t )  =  u(t)  cos27t  fct  (4.2) 

where  fc  is  the  carrier  frequency.  That  is,  the  modulated  baseband  signal  is  sent  as  the  I 
component  of  the  passband  signal.  To  see  what  happens  to  the  passband  signal  as  a  consequence 
of  the  modulation,  we  plot  it  in  Figure  4.2.  For  the  nth  symbol  interval  nT  <  t  <  (n  +  1)T,  we 
have  up(t)  =  cos27r fct  if  b[n]  =  +1,  and  up{t)  =  —  cos2tt fct  =  cos(2irfct  +  tt)  if  b[n]  =  —1.  Thus, 
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binary  antipodal  modulation  switches  the  phase  of  the  carrier  between  two  values  0  and  n,  which 
is  why  it  is  termed  Binary  Phase  Shift  Keying  (BPSK)  when  applied  to  a  passband  channel: 

We  know  from  Chapter  2  that  any  passband  signal  can  be  represented  in  terms  of  two  real- valued 
baseband  waveforms,  the  1  and  Q  components. 

up(t)  =  uc[t)  cos27t  fct  —  us(t )  sin27r/cf 

The  complex  envelope  of  up{t)  is  given  by  u(t )  =  uc{t )  +  jus(t).  For  BPSK,  the  1  component  is 
modulated  using  binary  antipodal  signaling,  while  the  Q  component  is  not  used,  so  that  u(t)  = 
uc(t).  However,  noting  that  the  two  signals,  uc{t)  cos  2nfct  and  us(t)  sin27r/cf  are  orthogonal 
regardless  of  the  choice  of  uc  and  us ,  we  realize  that  we  can  modulate  both  1  and  Q  components 
independently,  without  affecting  their  orthogonality.  In  this  case,  we  have 

uc{t )  =  Y  bc[n\p(t  -  nT),  ua(t)  =  Y  bs[n\p(t  -  nT) 

n  n 

The  complex  envelope  is  given  by 

u(t)  =  uc(t )  +jua(t)  =  Y  (bdn\  +Jbs[n])p{t  -  nT)  =  Y  b[n\p(t  -  nT)  (4.3) 

n  n 

where  {b[n\  =  bc[n]  +  jbs[n]}  are  complex-valued  symbols. 


Figure  4.3:  QPSK  illustrated  for  fc  =  with  symbol  sequences  {bc[n\}  =  {+1,  — 1,  — 1}  and 
{6s[n]}  =  {  —  1,  +1,  —1}.  The  phase  of  the  passband  signal  is  — 7t/4  in  the  first  symbol  interval, 
switches  to  37t/4  in  the  second,  and  to  — 37t/4  in  the  third. 


Let  us  see  what  happens  to  the  passband  signal  when  bc[n],  bs[n\  each  take  values  in  {±1}  (i.e., 

b[n\  =  bc[n]  +  jbs[n]  takes  values  in  {±1  ±  j}).  For  the  nth  symbol  interval  nT  <  t  <  (n  +  1)T: 

up(t)  =  cos  2nfct  —  sin  2nfct  =  \[2  cos  ( 2nfct  +  7t/4)  if  bc[n)  =  +1,  bs[n ]  =  +1; 

up(t)  =  cos27r/cf  +  sin27r fct  =  \/2cos  (27r/cf  —  7t/4)  if  bc[n)  =  +1,  bs[n]  =  —1; 

up(t)  =  —  cos27r fct  —  sin27r fct  =  \/2cos  ( 2irfct  +  37t/4)  if  bc[n]  =  —1,  bs[n]  =  +1; 

uP(t)  =  —  cos2n fct  +  sin  2nfct  =  \/2  cos  (2nfct  —  37t/4)  if  bc[n]  =  — 1  ,bs[n\  =  —1. 

Thus,  the  modulation  causes  the  passband  signal  to  switch  its  phase  among  four  possibilities, 
{±7t/4,  ±37t/4},  as  illustrated  in  Figure  4.3,  which  is  why  we  call  it  Quadrature  Phase  Shift 
Keying  (QPSK). 

Equivalently,  we  could  have  seen  this  from  the  complex  envelope.  Note  that  the  QPSK  symbols 
can  be  written  as  b[n\  =  '/2e?0^,  where  6[n]  E  {±7t/4,  ±37t/4}.  Thus,  over  the  nth  symbol,  we 
have 

up(t)  =  Re  (6[n]ej27r^ct)  =  Re  j  =  \/2cos  (27t fct  +  6[n\) ,  nT  <  t  <  (n  +  1)T 
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This  indicates  that  it  is  actually  easier  to  figure  out  what  is  happening  to  the  passband  signal 
by  working  with  the  complex  envelope.  We  therefore  work  in  the  complex  baseband  domain  for 
the  remainder  of  this  chapter. 

In  general,  the  complex  envelope  for  a  linearly  modulated  signal  is  given  by  (4.1),  where  b[n]  = 
bc[n]  +  jbs[n]  =  r[n\e^6^  can  be  complex-valued.  We  can  view  this  as  bc[n)  modulating  the 
I  component  and  bs[n]  modulating  the  Q  component,  or  as  scaling  the  envelope  by  r[n\  and 
switching  the  phase  by  6[n].  The  set  of  values  that  each  symbol  can  take  is  called  the  signaling 
alphabet,  or  constellation.  We  can  plot  the  constellation  in  a  two-dimensional  plot,  with  the  x- 
axis  denoting  the  real  part  bc[n]  (corresponding  to  the  I  component)  and  the  y- axis  denoting  the 
imaginary  part  bs[n]  (corresponding  to  the  Q  component).  Indeed,  this  is  why  linear  modulation 
over  passband  channels  is  also  termed  two-dimensional  modulation.  Note  that  this  provides  a 
unified  description  of  constellations  that  can  be  used  over  both  baseband  and  passband  channels: 
for  physical  baseband  channels,  we  simply  constrain  b[n]  =  bc[n }  to  be  real- valued,  setting  bs[n]  = 
0. 


BPSK/2PAM 


4PAM 


QPSK/4PSK/4QAM 


8PSK 

-4-^ 


16QAM 

•  •  • 


Figure  4.4:  Some  commonly  used  constellations.  Note  that  2PAM  and  4PAM  can  be  used  over 
both  baseband  and  passband  channels,  while  the  two-dimensional  constellations  QPSK,  8PSK 
and  16QAM  are  for  use  over  passband  channels. 


Figure  4.4  shows  some  common  constellations.  Pulse  Amplitude  Modulation  (PAM)  corresponds 
to  using  multiple  amplitude  levels  along  the  I  component  (setting  the  Q  component  to  zero). 
This  is  often  used  for  signaling  over  physical  baseband  channels.  Using  PAM  along  both  I  and  Q 
axes  corresponds  to  Quadrature  Amplitude  Modulation  (QAM).  If  the  constellation  points  lie  on 
a  circle,  they  only  affect  the  phase  of  the  carrier:  such  signaling  schemes  are  termed  Phase  Shift 
Keying  (PSK).  When  naming  a  modulation  scheme,  we  usually  indicate  the  number  of  points 
in  the  constellations.  BPSK  and  QPSK  are  special:  BPSK  (or  2PSK)  can  also  be  classified  as 
2PAM,  while  QPSK  (or  4PSK)  can  also  be  classified  as  4QAM. 

Each  symbol  in  a  constellation  of  size  M  can  be  uniquely  mapped  to  log2  M  bits.  For  a  symbol 
rate  of  1/T  symbols  per  unit  time,  the  bit  rate  is  therefore  Iosj,M  bits  per  unit  time.  Since  the 
transmitted  bits  often  contain  redundancy  due  to  a  channel  code  employed  for  error  correction  or 
detection,  the  information  rate  is  typically  smaller  than  the  bit  rate.  The  choice  of  constellation 
for  a  particular  application  depends  on  considerations  such  as  power-bandwidtli  tradeoffs  and 
implementation  complexity.  We  shall  discuss  these  issues  once  we  develop  more  background. 
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4.2  Bandwidth  Occupancy 


Bandwidth  is  a  precious  commodity,  hence  it  is  important  to  quantify  the  frequency  occupancy 
of  communication  signals.  To  this  end,  consider  the  complex  envelope  of  a  linearly  modulated 
signal  (the  two-sided  bandwidth  of  this  complex  envelope  equals  the  physical  bandwidth  of  the 
corresponding  passband  signal),  which  has  the  form  given  in  (4.1):  u(t)  =  'Yhnb[n\p{t  —  nT). 
The  complex- valued  symbol  sequence  { 6 [rz] }  is  modeled  as  random.  Modeling  the  sequence  as 
random  at  the  transmitter  makes  sense  because  the  latter  does  not  control  the  information  being 
sent  (e.g.,  it  depends  on  the  specific  computer  hie  or  digital  audio  signal  being  sent).  Since  this 
information  is  mapped  to  the  symbols  in  some  fashion,  it  follows  that  the  symbols  themselves  are 
also  random  rather  than  deterministic.  Modeling  the  symbols  as  random  at  the  receiver  makes 
even  more  sense,  since  the  receiver  by  definition  does  not  know  the  symbol  sequence  (otherwise 
there  would  be  no  need  to  transmit).  However,  for  characterizing  the  bandwidth  occupancy  of  the 
digitally  modulated  signal  u,  we  do  not  compute  statistics  across  different  possible  realizations 
of  the  symbol  sequence  {6[n]}.  Rather,  we  define  the  quantities  of  interest  in  terms  of  averages 
across  time,  treating  u(t)  as  a  finite  power  signal  which  can  be  modeled  as  deterministic  once  the 
symbol  sequence  {6[n]}  is  fixed.  (We  discuss  concepts  of  statistical  averaging  across  realizations 
later,  when  we  discuss  random  processes  in  Chapter  5.) 

We  introduce  the  concept  of  PSD  in  Section  4.2.1.  In  Section  4.2.2,  we  state  our  main  result  on 
the  PSD  of  digitally  modulated  signals,  and  discuss  how  to  compute  bandwidth  once  we  know 
the  PSD. 


4.2.1  Power  Spectral  Density 


X(t) 


1- 

H(f) 

Ai 

1 

V 

Power  Meter 


Sx(  v)  Af 


Figure  4.5:  Operational  definition  of  PSD. 


We  now  introduce  the  important  concept  of  power  spectral  density  (PSD),  which  specifies  how 
the  power  in  a  signal  is  distributed  in  different  frequency  bands. 

Power  Spectral  Density:  The  power  spectral  density  (PSD),  Sx(f),  for  a  finite-power  signal 
x(t)  is  defined  through  the  conceptual  measurement  depicted  in  Figure  4.5.  Pass  x(t)  through 
an  ideal  narrowband  filter  with  transfer  function 

f  i  i)  —  — L  <  f  <  y  _L  AL 

u[J)  \  0,  else 

The  PSD  evaluated  at  u,  Sx(u),  is  defined  as  the  measured  power  at  the  filter  output,  divided 
by  the  filter  width  Af  (in  the  limit  as  Af  — y  0). 

Example  (PSD  of  complex  exponentials):  Let  us  now  find  the  PSD  of  x(t)  =  Ae^2'Kiot+e\ 
Since  the  frequency  content  of  x  is  concentrated  at  /o,  the  power  meter  in  Figure  4.5  will  have 
zero  output  for  is  fo  (as  Af  — >  0,  f0  falls  outside  the  filter  bandwidth  for  any  such  /0).  Thus, 
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Sx(f)  =  0  for  /  7^  f0.  On  the  other  hand,  for  v  =  /o,  the  output  of  the  power  meter  is  the  entire 
power  of  x ,  which  is 

rh+^f 

PX  =  A2  =  Sx(f)df 

Jfo-¥ 


We  conclude  that  the  PSD  is  Sx(f)  =  A25(f  —  /0).  Extending  this  reasoning  to  a  sum  of  complex 
exponentials,  we  have 


PSD  of  Ate3{2nht+()i)  =  A2S(f  -  fi ) 

i  i 

where  fi  are  distinct  frequencies  (positive  or  negative),  and  Ai,  Qi  are  the  amplitude  and  phase, 
respectively,  of  the  it\i  complex  exponential.  Thus,  for  a  real-valued  sinusoid,  we  obtain 

SM  =  \s (/  -  fo)  +  \s(f  +  /o)  ,  for  x(t)  =  cos(2 wfat +0)  =  i (4.4) 


Periodogram-based  PSD  estimation:  One  way  to  carry  out  the  conceptual  measurement  in 
Figure  4.5  is  to  limit  x(t)  to  a  finite  observation  interval,  compute  its  Fourier  transform  and  hence 
its  energy  spectral  density  (which  is  the  magnitude  square  of  the  Fourier  transform),  and  then 
divide  by  the  length  of  the  observation  interval.  The  PSD  is  obtained  by  letting  the  observation 
interval  get  large.  Specifically,  define  the  time-windowed  version  of  x  as 


XT0(t)  =  (4.5) 

where  T0  is  the  length  of  the  observation  interval.  Since  T0  is  finite  and  x(t)  has  finite  power, 
XT0(t )  has  finite  energy,  and  we  can  compute  its  Fourier  transform 


XT.(f)  =  X(xT.) 

The  energy  spectral  density  of  xt0  is  given  by  |Xro(/)|2.  Averaging  this  over  the  observation 
interval,  we  obtain  the  estimated  PSD 


Sx(f) 


\XtM)\2 

T0 


(4.6) 


The  estimate  in  (4.6),  which  is  termed  a  periodogram,  can  typically  be  obtained  by  taking  the 
DFT  of  a  sampled  version  of  the  time  windowed  signal;  the  time  interval  Ta  must  be  large  enough 
to  give  the  desired  frequency  resolution,  while  the  sampling  rate  must  be  large  enough  to  capture 
the  variations  in  x(t).  The  estimated  PSDs  obtained  over  multiple  observation  intervals  can  then 
be  averaged  further  to  get  smoother  estimates. 

Formally,  we  can  define  the  PSD  in  the  limit  of  large  time  windows  as  follows: 

s,(f)  =  Tlim  AADl  (4.7) 

T0— kx)  10 


Units  for  PSD:  Power  per  unit  frequency  has  the  same  units  as  power  multiplied  by  time,  or 
energy.  Thus,  the  PSD  is  expressed  in  units  of  Watts/Hertz,  or  Joules. 


Power  in  terms  of  PSD:  The  power  Px  of  a  finite  power  signal  x  is  given  by  integrating  its 
PSD: 


Px  = 


Sx(f)df 


(4.8) 
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4.2.2  PSD  of  a  linearly  modulated  signal 


We  are  now  ready  to  state  our  result  on  the  PSD  of  a  linearly  modulated  signal  u(t)  =  Yhn  b[n)p(t— 
nT).  While  we  derive  a  more  general  result  in  Appendix  4. A,  our  result  here  applies  to  the  fol¬ 
lowing  important  special  case: 

(a)  the  symbols  have  zero  DC  value:  limyv^oo  2jV1+1  =  0;  and 

(b)  the  symbols  are  uncorrelated:  limjv->oo  ^  J2n=-N^in]^*[n  ~~  =  0  for  k  ^  0. 


Theorem  4.2.1  (PSD  of  a  linearly  modulated  signal)  Consider  a  linearly  modulated  signal 

At)  =  y  b[n]p{t  —  nT) 


where  the  symbol  sequence  {h[n]}  is  zero  mean  and  uncorrelated  with  average  symbol  energy 


N 


|6H|2  =  iToo2FTT  £  161 


nil2  =  al 


n=—N 


Then  the  PSD  is  given  by 

and  the  power  of  the  modulated  signal  is 


q  (f\  _  lP(/)!2  2 


°IM2 

T 


P,  = 


(4.9) 


(4.10) 


where  ||p||2  denotes  the  energy  of  the  modulating  pulse. 


See  Appendix  4. A  for  a  proof  of  (4.9),  which  follows  from  specializing  a  more  general  expression. 
The  expression  for  power  follows  from  integrating  the  PSD: 


Pu  = 


Su(f)df  =  ^ 


\P(f)\Zdf  =  f  /  \p(t)\2dt  = 


°2bM 


where  we  have  used  Parsevahs  identity. 

An  intuitive  interpretation  of  this  theorem  is  as  follows.  Every  T  time  units,  we  send  a  pulse 
of  the  form  b[n]p(t  —  nT)  with  average  energy  spectral  density  <r2|P(/)r,  so  that  the  PSD  is 
obtained  by  dividing  this  by  T.  The  same  reasoning  applies  to  the  expression  for  power:  every 
T  time  units,  we  send  a  pulse  b[n\p{t  —  nT)  with  average  energy  of||p||2,  so  that  the  power  is 
obtained  by  dividing  by  T.  The  preceding  intuition  does  not  apply  when  successive  symbols  are 
correlated,  in  which  case  we  get  the  more  complicated  expression  (4.32)  for  the  PSD  in  Appendix 
4. A. 

Once  we  know  the  PSD,  we  can  define  the  bandwidth  of  u  in  a  number  of  ways. 

3  dB  bandwidth:  For  symmetric  Su(f)  with  a  maximum  at  /  =  0,  the  3  dB  bandwidth  B^b 
is  defined  by  Su(B3dB/2)  =  Su(—B3dB/ 2)  =  |S'lt(0).  That  is,  the  3  dB  bandwidth  is  the  size 
of  the  interval  between  the  points  at  which  the  PSD  is  3  dB,  or  a  factor  of  smaller  than  its 
maximum  value. 

Fractional  power  containment  bandwidth.  This  is  the  size  of  the  smallest  interval  that 
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contains  a  given  fraction  of  the  power.  For  example,  for  symmetric  Su(f),  the  99%  fractional 
power  containment  bandwidth  B  is  defined  by 

/B  /  2  poo 

Su(f)df  =  0.99 Pu  =  0.99  /  Su(f)df 

-B/2  J— oo 

(replace  0.99  in  the  preceding  equation  by  any  desired  fraction  7  to  get  the  corresponding  7 
power  containment  bandwidth). 

Time/frequency  normalization:  Before  we  discuss  examples  in  detail,  let  us  simplify  our 
life  by  making  a  simple  observation  on  time  and  frequency  scaling.  Suppose  we  have  a  linearly 
modulated  system  operating  at  a  symbol  rate  of  1/T,  as  in  (4.1).  We  can  think  of  it  as  a 
normalized  system  operating  at  a  symbol  rate  of  one,  where  the  unit  of  time  is  T.  This  implies 
that  the  unit  of  frequency  is  1/T.  In  terms  of  these  new  units,  we  can  write  the  linearly  modulated 
signal  as 

ui(t)  =  y~]b[re]pi(t  -  n) 

n 

where  pi(t)  is  the  modulation  pulse  for  the  normalized  system.  For  example,  for  a  rectangular 
pulse  timelimited  to  the  symbol  interval,  we  have  pi(t)  =  J[0,i](f).  Suppose  now  that  the  band¬ 
width  of  the  normalized  system  (computed  using  any  definition  that  we  please)  is  B 1.  Since 
the  unit  of  frequency  is  1/T,  the  bandwidth  in  the  original  system  is  B\jT .  Thus,  in  terms  of 
determining  frequency  occupancy,  we  can  work,  without  loss  of  generality,  with  the  normalized 
system.  I11  the  original  system,  what  we  are  really  doing  is  working  with  the  normalized  time 
t/T  and  the  normalized  frequency  /T. 


fT 

Figure  4.6:  PSD  corresponding  to  rectangular  and  sine  timelimited  pulses.  The  main  lobe  of  the 
PSD  is  broader  for  the  sine  pulse,  but  its  99%  power  containment  bandwidth  is  much  smaller. 


Rectangular  pulse:  Without  loss  of  generality,  consider  a  normalized  system  with  pi(t)  = 
J[0ii](t),  for  which  P\(f)  =  sine  (/)e_-77r/.  For  {5[n]}  i.i.d.,  taking  values  ±1  with  equal  probability, 
we  have  of  —  1.  Applying  (4.9),  we  obtain 

Sui{f)  =  o-feSinc2(/)  (4.11) 

Integrating,  or  applying  (4.10),  we  obtain  Pu  =  of.  The  scale  factor  of  of  is  not  important,  since 
it  drops  out  for  any  definition  of  bandwidth.  We  therefore  set  it  to  of  =  1.  The  PSD  for  the 
rectangular  pulse,  along  with  that  for  a  sine  pulse  introduced  shortly,  is  plotted  in  Figure  4.6. 
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Note  that  the  PSD  for  the  rectangular  pulse  has  much  fatter  tails,  which  does  not  bode  well  for 
its  bandwidth  efficiency.  For  fractional  power  containment  bandwidth  with  fraction  7,  we  have 
the  equation 

/B  i/2  poo  pi 

sine 2  fdf  —  7  /  sine 2 fdf  =  7  /  1 2dt  =  7 

-Bi/2  J-oo  Jo 

using  Parsevahs  identity.  We  therefore  obtain,  using  the  symmetry  of  the  PSD,  that  the  band¬ 
width  is  the  numerical  solution  to  the  equation 

Bi/2 

sine 2  fdf  =  7/2 

For  example,  for  7  =  0.99,  we  obtain  B\  =  10.2,  while  for  7  =  0.9,  we  obtain  B\  =  0.85. 
Thus,  if  we  wish  to  be  strict  about  power  containment  (e.g.,  in  order  to  limit  adjacent  channel 
interference  in  wireless  systems),  the  rectangular  timelimited  pulse  is  a  very  poor  choice.  On  the 
other  hand,  in  systems  where  interference  or  regulation  are  not  significant  issues  (e.g.,  low-cost 
wired  systems),  this  pulse  may  be  a  good  choice  because  of  its  ease  of  implementation  using 
digital  logic. 


(4.12) 


Example  4.2.1  (Bandwidth  computation):  A  passband  system  operating  at  a  carrier  fre¬ 
quency  of  2.4  GHz  at  a  bit  rate  of  20  Mbps.  A  rectangular  modulation  pulse  timelimited  to  the 
symbol  interval  is  employed. 

(a)  Find  the  99%  and  90%  power  containment  bandwidths  if  the  constellation  used  is  16-QAM. 

(b)  Find  the  99%  and  90%  power  containment  bandwidths  if  the  constellation  used  is  QPSK. 
Solution: 

(a)  The  16-QAM  system  sends  4  bits/symbol,  so  that  the  symbol  rate  1/T  equals  4°b^s/^j^i  =  5 
Msymbols/sec.  Since  the  99%  power  containment  bandwidth  for  the  normalized  system  is 
B\  =  10.2,  the  required  bandwidth  is  B\jT  =  51  MHz.  Since  the  90%  power  containment 
for  the  normalized  system  is  Bi  =  0.85,  the  required  bandwidth  B\/T  equals  4.25  MHz. 

(b)  The  QPSK  system  sends  2  bits/symbol,  so  that  the  symbol  rate  is  10  Msymbols/sec.  The 
bandwidths  required  are  therefore  double  those  in  (a):  the  99%  power  containment  bandwidth 
is  102  MHz,  while  the  90%  power  containment  bandwidth  is  8.5  MHz. 

Clearly,  when  the  criterion  for  defining  bandwidth  is  the  same,  then  16-QAM  consumes  half  the 
bandwidth  compared  to  QPSK  for  a  fixed  bit  rate.  However,  it  is  interesting  to  note  that,  for 
the  rectangular  timelimited  pulse,  a  QPSK  system  where  we  are  sloppy  about  power  leakage 
(90%  power  containment  bandwidth  of  8.5  MHz)  can  require  far  less  bandwidth  than  a  system 
using  a  more  bandwidth-efficient  16-QAM  constellation  where  we  are  strict  about  power  leakage 
(99%  power  containment  bandwidth  of  51  MHz).  This  extreme  variation  of  bandwidth  when  we 
tweak  definitions  slightly  is  because  of  the  poor  frequency  domain  containment  of  the  rectangular 
timelimited  pulse.  Thus,  if  we  are  serious  about  limiting  frequency  occupancy,  we  need  to  think 
about  more  sophisticated  designs  for  the  modulation  pulse. 


Smoothing  out  the  rectangular  pulse:  A  useful  alternative  to  using  the  rectangular  pulse, 
while  still  keeping  the  modulating  pulse  timelimited  to  a  symbol  interval,  is  the  sine  pulse,  which 
for  the  normalized  system  equals 


Pi (t)  =  \/2sin(7 rt)  I[0,i](t) 

Since  the  sine  pulse  does  not  have  the  sharp  edges  of  the  rectangular  pulse  in  the  time  domain, 
we  expect  it  to  be  more  compact  in  the  frequency  domain.  Note  that  we  have  normalized  the 
pulse  to  have  unit  energy,  as  we  did  for  the  normalized  rectangular  pulse.  This  implies  that  the 
power  of  the  modulated  signal  is  the  same  in  the  two  cases,  so  that  we  can  compare  PSDs  under 
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the  constraint  that  the  area  under  the  PSDs  remains  constant.  Setting  cr2  =  1  and  using  (4.9), 
we  obtain  (see  Problem  4.1): 


SUl(f) 


pm)  r 


8  COS2  7T / 

^(l-4/2)2 


(4.13) 


Proceeding  as  we  did  for  obtaining  (4.12),  the  fractional  power  containment  bandwidth  for  frac¬ 
tion  7  is  given  by  the  formula: 


7 Tz 


COS2  7T  f 

(1-4  Pf 


df  =  7/2 


(4.14) 


For  7  =  0.99,  we  obtain  B i  =  1.2,  which  is  an  order  of  magnitude  improvement  over  the 
corresponding  value  of  B\  =  10.2  for  the  rectangular  pulse. 

While  the  sine  pulse  has  better  frequency  domain  containment  than  the  rectangular  pulse,  it  is 
still  not  suitable  for  strictly  bandlimited  channels.  We  discuss  pulse  design  for  such  channels 
next. 


4.3  Design  for  Bandlimited  Channels 

Suppose  that  you  are  told  to  design  your  digital  communication  system  so  that  the  transmitted 
signal  fits  between  2.39  and  2.41  GHz;  that  is,  you  are  given  a  passband  channel  of  bandwidth  20 
MHz  at  a  carrier  frequency  of  2.4  GHz.  Any  signal  that  you  transmit  over  this  band  has  a  complex 
envelope  with  respect  to  2.4  GHz  that  occupies  a  band  from  -10  MHz  to  10  MHz.  Similarly,  the 
passband  channel  (modeled  as  an  LTI  system)  has  an  impulse  response  whose  complex  envelope  is 
bandlimited  from  -10  MHz  to  10  MHz.  In  general,  for  a  passband  channel  or  signal  of  bandwidth 
W,  with  an  appropriate  choice  of  reference  frequency,  we  have  a  corresponding  complex  baseband 
signal  spanning  the  band  \—W/2,W/2].  Thus,  we  restrict  our  design  to  the  complex  baseband 
domain,  with  the  understanding  that  the  designs  can  be  translated  to  passband  channels  by 
upconversion  of  the  I  and  Q  components  at  the  transmitter,  and  downconversion  at  the  receiver. 
Also,  note  that  the  designs  specialize  to  physical  baseband  channels  if  we  restrict  the  baseband 
signals  to  be  real-valued. 


4.3.1  Nyquist’s  Sampling  Theorem  and  the  Sine  Pulse 

Our  first  step  in  understanding  communication  system  design  for  such  a  bandlimited  channel  is 
to  understand  the  structure  of  bandlimited  signals.  To  this  end,  suppose  that  the  signal  s(t)  is 
bandlimited  to  [— W/2,  Wj 2],  We  can  now  invoke  Nyquist’s  sampling  theorem  (proof  postponed 
to  Section  4.5)  to  express  the  signal  in  terms  of  its  samples  at  rate  W. 

Theorem  4.3.1  (Nyquist’s  sampling  theorem)  Any  signal  s(t)  bandlimited  to  [— -y,  -^-]  can 

be  described  completely  by  its  samples  {s(-^f)}  at  rate  W.  The  signal  s(t)  can  be  recovered  from 
its  samples  using  the  following  interpolation  formula: 

s«=  £  s(w)p(t~w)  (4-15) 


where  p(t)  =  sinc(HT). 
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Degrees  of  freedom:  What  does  the  sampling  theorem  tell  us  about  digital  modulation?  The 
interpolation  formula  (4.15)  tells  us  that  we  can  interpret  s(t)  as  a  linearly  modulated  signal 
with  symbol  sequence  equal  to  the  samples  {s(n/W)},  symbol  rate  1/T  equal  to  the  bandwidth 
W,  and  modulation  pulse  given  by  p{t)  =  sinc(VFf)  -B-  P(f)  =  ^I[~w/2,w/2](f)-  Thus,  linear 
modulation  with  the  sine  pulse  is  able  to  exploit  all  the  “degrees  of  freedom”  available  in  a 
bandlimited  channel. 

Signal  space:  If  we  signal  over  an  observation  interval  of  length  T0  using  linear  modulation 
according  to  the  interpolation  formula  (4.15),  then  we  have  approximately  WT0  complex- valued 
samples.  Thus,  while  the  signals  we  send  are  continuous-time  signals,  which  in  general,  lie  in  an 
infinite-dimensional  space,  the  set  of  possible  signals  we  can  send  in  a  finite  observation  interval 
of  length  T0  live  in  a  complex-valued  vector  space  of  finite  dimension  WT0,  or  equivalently,  a 
real-valued  vector  space  of  dimension  2 WT0.  Such  geometric  views  of  communication  signals  as 
vectors,  often  termed  signal  space  concepts,  are  particularly  useful  in  design  and  analysis,  as  we 
explore  in  more  detail  in  Chapter  6. 


Figure  4.7:  Three  successive  sine  pulses  (each  pulse  is  truncated  to  a  length  of  10  symbol  intervals 
on  each  side)  modulated  by  +1,-1,+1.  The  actual  transmitted  signal  is  the  sum  of  these  pulses 
(not  shown).  Note  that,  while  the  pulses  overlap,  the  samples  at  t  —  0,T,  2T  are  equal  to  the 
transmitted  bits  because  only  one  pulse  is  nonzero  at  these  times. 


The  concept  of  Nyquist  signaling:  Since  the  sine  pulse  is  not  timelimited  to  a  symbol  interval, 
in  principle,  the  symbols  could  interfere  with  each  other.  The  time  domain  signal  corresponding 
to  a  bandlimited  modulation  pulse  such  as  the  sine  spans  an  interval  significantly  larger  than  the 
symbol  interval  (in  theory,  the  interval  is  infinitely  large,  but  we  always  truncate  the  waveform 
in  implementations).  This  means  that  successive  pulses  corresponding  to  successive  symbols 
which  are  spaced  by  the  symbol  interval  (i.e. ,  b[n\p(t  —  nT)  as  we  increment  n)  overlap  with, 
and  therefore  can  interfere  with,  each  other.  Figure  4.7  shows  the  sine  pulse  modulated  by  three 
bits,  +1,-1, +1.  While  the  pulses  corresponding  to  the  three  symbols  do  overlap,  notice  that,  by 
sampling  at  t  —  0,  t  —  T  and  t  =  2 T,  we  can  recover  the  three  symbols  because  exactly  one  of  the 
pulses  is  nonzero  at  each  of  these  times.  That  is,  at  sampling  times  spaced  by  integer  multiples  of 
the  symbol  time  T,  there  is  no  intersymbol  interference.  We  call  such  a  pulse  Nyquist  for  signaling 
at  rate  4,  and  we  discuss  other  examples  of  such  pulses  soon.  Designing  pulses  based  on  the 
Nyquist  criterion  allows  us  the  freedom  to  expand  the  modulation  pulses  in  time  beyond  the 
symbol  interval  (thus  enabling  better  containment  in  the  frequency  domain),  while  ensuring  that 
there  is  no  ISI  at  appropriately  chosen  sampling  times  despite  the  significant  overlap  between 
successive  pulses. 
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Figure  4.8:  The  baseband  signal  for  10  BPSK  symbols  of  alternating  signs,  modulated  using  the 
sine  pulse.  The  first  symbol  is  +1,  and  the  sample  at  time  t  —  0,  marked  with  V,  equals  +1,  as 
desired  (no  ISI).  However,  if  the  sampling  time  is  off  by  0.25T,  the  sample  value,  marked  by  ’+’, 
becomes  much  smaller  because  of  ISI.  While  it  still  has  the  right  sign,  the  ISI  causes  it  to  have 
significantly  smaller  noise  immunity.  See  Problem  4.14  for  an  example  in  which  the  ISI  due  to 
timing  mismatch  actually  causes  the  sign  to  flip. 


164 


The  problem  with  sine:  Are  we  done  then?  Should  we  just  use  linear  modulation  with  a  sine 
pulse  when  confronted  with  a  bandlimited  channel?  Unfortunately,  the  answer  is  no:  just  as  the 
rectangular  timelimited  pulse  decays  too  slowly  in  frequency,  the  rectangular  bandlimited  pulse, 
corresponding  to  the  sine  pulse  in  the  time  domain,  decays  too  slowly  in  time.  Let  us  see  what 
happens  as  a  consequence.  Figure  4.8  shows  a  plot  of  the  modulated  waveform  for  a  bit  sequence 
of  alternating  sign.  At  the  correct  sampling  times,  there  is  no  ISI.  However,  if  we  consider  a  small 
timing  error  of  0.25T,  the  ISI  causes  the  sample  value  to  drop  drastically,  making  the  system 
more  vulnerable  to  noise.  What  is  happening  is  that,  when  there  is  a  small  sampling  offset, 
we  can  make  the  ISI  add  up  to  a  large  value  by  choosing  the  interfering  symbols  so  that  their 
contributions  all  have  signs  opposite  to  that  of  the  desired  symbol  at  the  sampling  time.  Since 
the  sine  pulse  decays  as  1/i,  the  ISI  created  for  a  given  symbol  by  an  interfering  symbol  which 
is  n  symbol  intervals  away  decays  as  1  /n,  so  that,  in  the  worst-case,  the  contributions  from  the 
interfering  symbols  roughly  have  the  form  a  series  that  is  known  to  diverge.  Thus,  in 

theory,  if  we  do  not  truncate  the  sine  pulse,  we  can  make  the  ISI  arbitrarily  large  when  there  is 
a  small  timing  offset.  In  practice,  we  do  truncate  the  modulation  pulse,  so  that  we  only  see  ISI 
from  a  finite  number  of  symbols.  However,  even  when  we  do  truncate,  as  we  see  from  Figure  4.8, 
the  slow  decay  of  the  sine  pulse  means  that  the  ISI  adds  up  quickly,  and  significantly  reduces 
the  margin  of  error  when  noise  is  introduced  into  the  system. 

While  the  sine  pulse  may  not  be  a  good  idea  in  practice,  the  idea  of  using  bandwidth-efficient 
Nyquist  pulses  is  a  good  one,  and  we  now  develop  it  further. 


4.3.2  Nyquist  Criterion  for  ISI  Avoidance 

Nyquist  signaling:  Consider  a  linearly  modulated  signal 

=  E  b[n\p(t  —  nT ) 

n 

We  say  that  the  pulse  p[t)  is  Nyquist  (or  satisfies  the  Nyquist  criterion)  for  signaling  at  rate  ^ 
if  the  symbol-spaced  samples  of  the  modulated  signal  are  equal  to  the  symbols  (or  a  fixed  scalar 
multiple  of  the  symbols);  that  is,  u(kT)  =  b[k]  for  all  k.  That  is,  there  is  no  ISI  at  appropriately 
chosen  sampling  times  spaced  by  the  symbol  interval. 

In  the  time  domain,  it  is  quite  easy  to  see  what  is  required  to  satisfy  the  Nyquist  criterion.  The 
samples  u(kT )  =  b[n]p(kT  —  nT)  =  b[k\  (or  a  scalar  multiple  of  b[k])  for  all  k  if  and  only 
if  p(0)  =  1  (or  some  nonzero  constant)  and  p(mT )  =  0  for  all  integers  m  ^  0.  However,  for 
design  of  bandwidth  efficient  pulses,  it  is  important  to  characterize  the  Nyquist  criterion  in  the 
frequency  domain.  This  is  given  by  the  following  theorem. 

Theorem  4.3.2  (Nyquist  criterion  for  ISI  avoidance):  The  pulse  p(t)  O  P(f)  is  Nyquist 
for  signaling  at  rate  ^  if 

P(mT) = to = { l  ™  ^  °  <4-16) 

or  equivalently, 

1  00  k 

f  E  pu + t]  =  1  for  a“ f  (4-17) 

k=—oo 

The  proof  of  this  theorem  is  given  in  Section  4.5,  where  we  show  that  both  the  Nyquist  sampling 
theorem,  Theorem  4.3.1,  and  the  preceding  theorem  are  based  on  the  same  mathematical  result, 
that  the  samples  of  a  time  domain  signal  have  a  one-to-one  mapping  with  the  sum  of  translated 
(or  aliased )  versions  of  its  Fourier  transform. 
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In  this  section,  we  explore  the  design  implications  of  Theorem  4.3.2.  In  the  frequency  domain, 
the  translates  of  P(f)  by  integer  multiples  of  1/T  must  add  up  to  a  constant.  As  illustrated  by 
Figure  4.9,  the  minimum  bandwidth  pulse  for  which  this  happens  is  the  ideal  bandlimited  pulse 
over  an  interval  of  length  1/T. 

Not  Nyquist  Nyquist  with  minimum  bandwidth 

P(f  +  1/T)  P(f)  P(f  -  1/T)  P(f  +  1/T)  P(f)  P(f  -  1/T) 


1/T  1/T 

Figure  4.9:  The  minimum  bandwidth  Nyquist  pulse  is  a  sine. 


Minimum  bandwidth  Nyquist  pulse:  The  minimum  bandwidth  Nyquist  pulse  is 

P(f)=(  T,  l/l<£ 

[JI  \  0,  else 

corresponding  to  the  time  domain  pulse 

p{t)  =  sine  (t/T) 

As  we  have  already  discussed,  the  sine  pulse  is  not  a  good  choice  in  practice  because  of  its  slow 
decay  in  time.  To  speed  up  the  decay  in  time,  we  must  expand  in  the  frequency  domain,  while 
conforming  to  the  Nyquist  criterion.  The  trapezoidal  pulse  depicted  in  Figure  4.9  is  an  example 
of  such  a  pulse. 


Figure  4.10:  A  trapezoidal  pulse  which  is  Nyquist  at  rate  1/T.  The  (fractional)  excess  bandwidth 
is  a. 


The  role  of  excess  bandwidth:  We  have  noted  earlier  that  the  problem  with  the  sine  pulse 
arises  because  of  its  1/t  decay  and  the  divergence  of  the  harmonic  series  Y^=i  tp  which  implies 
that  the  worst-case  contribution  from  “distant”  interfering  symbols  at  a  given  sampling  instant 
can  blow  up.  Using  the  same  reasoning,  however,  a  pulse  p(t)  decaying  as  l/tb  for  b  >  1  should 
work,  since  the  series  does  converge  for  b  >  1.  A  faster  time  decay  requires  a  slower 

decay  in  frequency.  Thus,  we  need  excess  bandwidth,  beyond  the  minimum  bandwidth  dictated 
by  the  Nyquist  criterion,  to  fix  the  problems  associated  with  the  sine  pulse.  The  (fractional) 
excess  bandwidth  for  a  linear  modulation  scheme  is  defined  to  be  the  fraction  of  bandwidth 
over  the  minimum  required  for  ISI  avoidance  at  a  given  symbol  rate.  In  particular,  Figure  4.10 
shows  that  a  trapezoidal  pulse  (in  the  frequency  domain)  can  be  Nyquist  for  suitably  chosen 
parameters,  since  the  translates  {P{f  +  k/T)}  as  shown  in  the  figure  add  up  to  a  constant.  Since 
trapezoidal  P(/)  is  the  convolution  of  two  boxes  in  the  frequency  domain,  the  time  domain  pulse 
p{t)  is  the  product  of  two  sine  functions,  as  worked  out  in  the  example  below.  Since  each  sine 
decays  as  1/t,  the  product  decays  as  1/t2,  which  implies  that  the  worst-case  ISI  with  timing 
mismatch  is  indeed  bounded. 
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Example  4.3.1  Consider  the  trapezoidal  pulse  of  excess  bandwidth  a  shown  in  Figure  4.10. 

(a)  Find  an  explicit  expression  for  the  time  domain  pulse  p(t). 

(b)  What  is  the  bandwidth  required  for  a  passband  system  using  this  pulse  operating  at  120 
Mbps  using  64QAM,  with  an  excess  bandwidth  of  25%? 

Solution:  (a)  It  is  easy  to  check  that  the  trapezoid  is  a  convolution  of  two  boxes  as  follows  (we 
assume  0  <  a  <  1): 


Taking  inverse  Fourier  transforms,  we  obtain 


p(t)  =  — —  sine (t/T)^j  ^sinc (at/T)^j  =  sinc(t/T)sinc(at/T)  (4-18) 


The  presence  of  the  first  sine  provides  the  zeroes  required  by  the  time  domain  Nyquist  criterion: 
p[mT)  =  0  for  nonzero  integers  m  %  0.  The  presence  of  a  second  sine  yields  a  1/t2  decay, 
providing  robustness  against  timing  mismatch. 

(b)  Since  64  =  26,  the  use  of  64QAM  corresponding  to  sending  6  bits/symbol,  so  that  the  symbol 
rate  is  120/6  =  20  Msymbols/sec.  The  minimum  bandwidth  required  is  therefore  20  MHz,  so 
that  25%  excess  bandwidth  corresponds  to  a  bandwidth  of  20  x  1.25  =  25  MHz. 


Raised  cosine  pulse:  Replacing  the  straight  line  of  the  trapezoid  with  a  smoother  cosine- 
shaped  curve  in  the  frequency  domain  gives  us  the  raised  cosine  pulse  shown  in  Figure  4.12, 
which  has  a  faster,  1/t3,  decay  in  the  time  domain. 


P(f) 


T,  |/|  <  ^ 

f[l  +  cos((|/|-^)f)],  ^  <  |/|  <  ^ 

0,  I/I  >  W 


where  a  is  the  fractional  excess  bandwidth,  typically  chosen  in  the  range  where  0  <  a  <  1.  As 
shown  in  Problem  4.11,  the  time  domain  pulse  s(t)  is  given  by 


.  ,  t  .  COS  7T ClTf; 

pit)  =  sine  — ) - —  jr 

w  yT  l  _  (2at\2 


\  T  1 


This  pulse  inherits  the  Nyquist  property  of  the  sine  pulse,  while  having  an  additional  multiplica¬ 
tive  factor  that  gives  an  overall  1  /t3  decay  with  time.  The  faster  time  decay  compared  to  the 
sine  pulse  is  evident  from  a  comparison  of  Figures  4.12(b)  and  4.11(b). 

We  now  comment  on  some  interesting  properties  of  Nyquist  pulses. 

For  both  the  trapezoidal  and  raised  cosine  waveforms,  the  time  domain  pulse  has  a  sine  (at)  term 
which  provides  zeros  at  integer  multiples  of  T  —  A  This  means  that  the  pulse  is  Nyquist  at  rate 
j;  =  a.  In  other  words,  a  time  domain  factor  which  provides  “zeros  at  rate  a”  (i.e. ,  spaced  by 
1/a)  enables  Nyquist  signaling  at  rate  a.  A  pulse  which  is  trapezoidal  has  a  time  domain  pulse 
of  the  form  sincat  sinefet,  which  provides  zeros  at  rate  a  as  well  as  at  rate  b.  Thus,  this  pulse  is 
Nyquist  rate  a  and  at  rate  b. 

It  is  also  interesting  to  note  that,  once  we  have  zeros  at  integer  multiples  of  T,  we  also  have  zeros 
at  integer  multiples  of  A'T,  where  K  is  any  positive  integer.  In  other  words,  if  a  pulse  is  Nyquist 
at  rate  i,  then  it  is  also  Nyquist  at  integer  submultiples  of  this  rate;  that  is,  it  is  Nyquist  for  all 
rates  of  the  form  for  K  a  positive  integer.  Thus,  a  factor  sine  (at)  in  the  pulse  guarantees 
the  Nyquist  property  for  all  rates  a/K. 

Of  course,  we  are  typically  only  interested  in  the  highest  rate  for  a  given  bandwidth,  but  it  is 
interesting  to  play  with  the  preceding  observations,  as  we  do  in  the  following  example  involving 
a  trapezoidal  pulse. 
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(a)  Frequency  domain  boxcar 


(b)  Time  domain  sine  pulse 


Figure  4.11:  Sine  pulse  for  minimum  bandwidth  ISI-free  signaling  at  rate  1/T.  Both  time  and 
frequency  axes  are  normalized  to  be  dimensionless. 
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(a)  Frequency  domain  raised  cosine 
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Figure  4.12:  Raised  cosine  pulse  for  minimum  bandwidth  ISI-free  signaling  at  rate  1/T,  with 
excess  bandwidth  a.  Both  time  and  frequency  axes  are  normalized  to  be  dimensionless. 
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Figure  4.13:  Pulse  for  Example  4.3.2. 
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Figure  4.14:  The  frequency  domain  Nyquist  criterion  is  satisfied  for  4  =  14  Msymbols/sec  in 
Example  4.3.2(a). 


Example  4.3.2  Consider  passband  linear  modulation  using  the  bandlimited  pulse  shown  in 
Figure  4.13.  Answer  the  following  True/False  questions,  clearly  stating  y 

(a)  True  or  False:  The  pulse  p(t )  can  be  used  for  Nyquist  signaling  at 
using  a  16QAM  constellation. 

(b)  True  or  False:  The  pulse  p(t)  can  be  used  for  Nyquist  signaling  at 
using  an  8PSK  constellation. 

(c)  True  or  False:  The  pulse  p[t)  can  be  used  for  Nyquist  signaling  at 
using  an  8PSK  constellation. 

(d)  True  or  False:  The  pulse  p(t)  can  be  used  for  Nyquist  signaling  at 
using  a  QPSK  constellation. 

Solution:  (a)  The  symbol  rate  is 

1  56  Mbps  . 

T  =  a  k4  / - TTl  =  14  Msymbols/sec 

i  4  bits/symbol 

From  Figure  4.14,  we  see  that  for  this  rate,  the  frequency  domain  Nyquist  criterion  is  satisfied: 
J2kp(f  +  |0  is  constant.  Alternatively,  we  know  that  the  frequency  domain  trapezoid  corre¬ 
sponds  to  p(t)  =  sincai  sine bt  in  the  time  domain,  where  (a  —  6)/2  =  4,  (a  +  b)/2  =  10.  Solving, 
we  obtain  a  =  14  MHz,  6  =  6  MHz.  Thus,  the  time  domain  pulse  provides  zeros  at  rates  14  MHz 
and  6  MHz,  hence  it  is  indeed  Nyquist  at  rate  14  Msymbols/sec.  The  statement  is  therefore 
True. 

(b)  The  symbol  rate  is  j.  =  21Mbps  /  3bits  /  symbol  =  7  Msymbols/sec.  Since  the  pulse  is  Nyquist 
at  14  Msymbols/sec,  it  is  also  Nyquist  at  14/2  =  7  Msymbols/sec.  The  statement  is  therefore 
True. 

(c)  The  symbol  rate  is  ^  =  18Mbps/3bits/ symbol  =  6  Msymbols/sec.  As  shown  in  (a),  the  pulse 
has  a  sinc6t  term  that  provides  zeros  at  rate  6  MHz,  hence  the  statement  is  True. 

(d)  The  symbol  rate  is  4  =  25Mbps /2bits / symbol  =  12.5  Msymbols/sec.  This  is  not  an  integer 
sub  multiple  of  either  14  MHz  or  6  MHz,  the  rates  at  which  zeros  are  provided  by  the  two  sine 
factors.  Thus,  the  Nyquist  property  does  not  hold,  and  the  statement  is  False. 


:>ur  reasoning, 
a  bit  rate  of  56  Mbps 

a  bit  rate  of  21  Mbps 

a  bit  rate  of  18  Mbps 

a  bit  rate  of  25  Mbps 


4.3.3  Bandwidth  efficiency 

We  define  the  bandwidth  efficiency  of  linear  modulation  with  an  M-ary  alphabet  as 

r)B  —  log 2M  bits/symbol 

The  Nyquist  criterion  for  ISI  avoidance  says  that  the  minimum  bandwidth  required  for  ISI-free 
transmission  using  linear  modulation  equals  the  symbol  rate,  using  the  sine  as  the  modulation 
pulse.  For  such  an  idealized  system,  we  can  think  of  t)b  as  bits/second  per  Hertz,  since  the  symbol 
rate  equals  the  bandwidth.  Thus,  knowing  the  bit  rate  R &  and  the  bandwidth  efficiency  r)s  of 
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the  modulation  scheme,  we  can  determine  the  symbol  rate,  and  hence  the  minimum  required 
bandwidth  Bmin.  as  follows: 

r  —  — 

^min 

Vb 

This  bandwidth  would  then  be  expanded  by  the  excess  bandwidth  used  in  the  modulating  pulse. 
However,  this  is  not  included  in  our  definition  of  bandwidth  efficiency,  because  excess  bandwidth 
is  a  highly  variable  quantity  dictated  by  a  variety  of  implementation  considerations.  Once  we 
decide  on  the  fractional  excess  bandwidth  a,  the  actual  bandwidth  required  is 

B  =  (1  +  a)Bmin  =  (1  +  a)  — 

Vb 


4.3.4  Power-bandwidth  tradeoffs:  a  sneak  preview 

Clearly,  we  can  increase  bandwidth  efficiency  simply  by  increasing  M,  the  constellation  size. 
For  example,  the  bandwidth  efficiency  of  QPSK  is  2  bits/symbol,  while  that  of  16QAM  is  4 
bits/symbol.  What  stops  us  from  increasing  constellation  size,  and  hence  bandwidth  efficiency, 
indefinitely  is  noise,  and  the  fact  that  we  cannot  use  arbitrarily  large  transmit  power  (typically 
limited  by  cost  or  physical  and  regulatory  constraints)  to  overcome  it.  Noise  in  digital  communi¬ 
cation  systems  must  be  modeled  statistically,  hence  rigorous  discussion  of  a  formal  model  and  its 
design  consequences  is  postponed  to  Chapters  5  and  6.  However,  that  does  not  prevent  us  from 
giving  a  handwaving  sneak  preview  of  the  bottomline  here.  Note  that  this  subsection  is  meant 
as  a  teaser:  it  can  be  safely  skipped,  since  these  issues  are  covered  in  detail  in  Chapter  6. 


Figure  4.15:  Scaling  of  minimum  distance  and  energy  per  symbol. 


Intuitively  speaking,  the  effect  of  noise  is  to  perturb  constellation  points  from  the  nominal  loca¬ 
tions  shown  in  Figure  4.4,  which  leads  to  the  possibility  of  making  an  error  in  deciding  which 
point  was  transmitted.  For  a  given  noise  “strength”  (which  determines  how  much  movement  the 
noise  can  produce),  the  closer  the  constellation  points,  the  more  the  possibility  of  such  errors. 
In  particular,  as  we  shall  see  in  Chapter  6,  the  minimum  distance  between  constellation  points, 
termed  dmini  provide  a  good  measure  of  how  vulnerable  we  are  to  noise.  For  a  given  constellation 
shape,  we  can  increase  dmin  simply  by  scaling  up  the  constellation,  as  shown  in  Figure  4.15,  but 
this  comes  with  a  corresponding  increase  in  energy  expenditure.  To  quantify  this,  define  the 
energy  per  symbol  Es  for  a  constellation  as  the  average  of  the  squared  Euclidean  distances  of  the 
points  from  the  origin.  For  an  M- ary  constellation,  each  symbol  carries  log2  M  bits  of  informa¬ 
tion,  and  we  can  define  the  average  energy  per  bit  Ef,  as  Ef,  =  lo^sM  •  Specifically,  dmin  increases 
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from  2  to  4  by  scaling  as  shown  in  Figure  4.15.  Correspondingly,  Es  —  2  and  Eb  —  i  is  increased 
to  Es  =  8  and  Eb  =  4  in  Figure  4.15(b).  Thus,  doubling  the  minimum  distance  in  Figure  4.15 

d? 

leads  to  a  four-fold  increase  in  Es  and  Eb.  However,  the  quantity  does  not  change  due  to 
scaling;  it  depends  only  on  the  relative  geometry  of  the  constellation  points.  We  therefore  adopt 
this  scale- invariant  measure  as  our  notion  of  power  efficiency  for  a  constellation: 


Vp 


hmin 


(4.19) 


Since  this  quantity  is  scale- invariant,  we  can  choose  any  convenient  scaling  in  computing  it:  for 
QPSK,  choosing  the  scaling  on  the  left  in  Figure  4.15,  we  have  dmin  —  2,  Es  —  2,  Eb  =  1,  which 
gives  r]P  =  4. 

It  is  important  to  understand  how  these  quantities  relate  to  physical  link  parameters.  For  a 
given  bit  rate  Rb  and  received  power  Prx,  the  energy  per  bit  is  given  by  Eb  =  -^L.  It  is  worth 
verifying  that  the  units  make  sense:  the  numerator  has  units  of  Watts,  or  Joules/sec,  while  the 
denominator  has  units  of  bits/sec,  so  that  Eb  has  units  of  joules/bit.  We  shall  see  in  Chapter  6 
that  the  reliability  of  communication  is  determined  by  the  power  efficiency  rjp  (a  scale-invariant 
quantity  which  is  a  function  of  the  constellation  shape)  and  the  dimensionless  signal-to-noise  ratio 
(SNR)  measure  Eb/N0,  where  N0  is  the  noise  power  spectral  density,  which  has  units  of  watts/Hz, 
or  Joules.  Specifically,  the  reliability  can  be  approximately  characterized  by  the  product  so 

that,  for  a  given  desired  reliability,  the  required  energy  per  bit  (and  hence  power)  scales  inversely 
as  power  efficiency  for  a  fixed  bit  rate.  Communication  link  designers  use  such  concepts  as  the 
basis  for  forming  a  “link  budget”  that  can  be  used  to  choose  link  parameters  such  as  transmit 
power,  antenna  gains  and  range. 

Even  based  on  these  rather  sketchy  and  oversimplified  arguments,  we  can  draw  quick  conclusions 
on  the  power-bandwidth  tradeoffs  in  using  different  constellations,  as  shown  in  the  following 
example. 


Example  4.3.3  We  wish  to  design  a  passband  communication  system  operating  at  a  bit  rate  of 
40  Mbps. 

(a)  What  is  the  bandwidth  required  if  we  employ  QPSK,  with  an  excess  bandwidth  of  25%. 

(b)  What  if  we  now  employ  16QAM,  again  with  excess  bandwidth  25%. 

(c)  Suppose  that  the  QPSK  system  in  (a)  attains  a  desired  reliability  when  the  transmit  power  is 
50  mW.  Give  an  estimate  of  the  transmit  power  needed  for  the  16QAM  system  in  (b)  to  attain 
a  similar  reliability. 

(d)  How  does  the  bandwidth  and  transmit  power  required  change  for  the  QPSK  system  if  we 
increase  the  bit  rate  to  80  Mbps. 

(e)  How  does  the  bandwidth  and  transmit  power  required  change  for  the  QPSK  system  if  we 
increase  the  bit  rate  to  80  Mbps. 

Solution:  (a)  The  bandwidth  efficiency  of  QPSK  is  2  bits/symbol,  hence  the  minimum  bandwidth 
required  is  20  MHz.  For  excess  bandwidth  of  25%,  the  bandwidth  required  is  25  MHz. 

(b)  The  bandwidth  efficiency  of  16QAM  is  4  bits/symbol,  hence,  reasoning  as  in  (a),  the  band¬ 
width  required  is  12.5  MHz. 

(c)  We  wish  to  set  r]PEb/N0  to  be  equal  for  both  systems  in  order  to  keep  the  reliability  roughly 
the  same.  Assuming  that  the  noise  PSD  No  is  the  same  for  both  systems,  the  required  Eb  scales 
as  1  /rjp.  Since  the  bit  rates  Rb  for  both  systems  are  equal,  the  required  received  power  P  =  EbRb 
(and  hence  the  required  transmit  power,  assuming  that  received  power  scales  linearly  with  trans¬ 
mit  power)  also  scales  as  l/rjp-  We  already  know  that  r]p  =  4  for  QPSK.  It  remains  to  fold  rjp  for 
16QAM,  which  is  shown  in  Problem  4.15  to  equal  8/5.  We  therefore  conclude  that  the  transmit 
power  for  the  16QAM  system  can  be  estimated  as 


Pt(16QAM) 


Pt(QPSK) 


rjp(QPSK) 

rjp{lQQAM) 
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which  evaluates  for  125  mW. 

(d)  For  fixed  bandwidth  efficiency,  required  bandwidth  scales  linearly  with  bit  rate,  hence  the 
new  bandwidth  required  is  50  MHz.  In  order  to  maintain  a  given  reliability,  we  must  maintain 
the  same  value  of  rjpEb/N0  as  in  (c).  The  power  efficiency  rjp  is  unchanged,  since  we  are  using 
the  same  constellation.  Assuming  that  the  noise  PSD  Nq  is  unchanged,  the  required  energy  per 
bit  Eb  is  unchanged,  hence  transmit  power  must  scale  up  linearly  with  bit  rate  Rb.  Thus,  the 
power  required  using  QPSK  is  now  100  mW. 

(e)  Arguing  as  in  (d),  we  require  a  bandwidth  of  25  MHz  and  a  power  of  250  mW  for  16QAM, 
using  the  results  in  (b)  and  (c). 


4.3.5  The  Nyquist  criterion  at  the  link  level 


Symbols 

fb[n] } 
rate  1/T 


Figure  4.16:  Nyquist  criterion  at  the  link  level. 


Figure  4.16  shows  a  block  diagram  for  a  link  using  linear  modulation,  with  the  entire  model 
expressed  in  complex  baseband.  The  symbols  |6[n]}  are  passed  through  the  transmit  filter  to 
obtain  the  waveform  )Crj  b[n]grx(t  —  nT).  This  then  goes  through  the  channel  filter  gc(t ),  and 
then  the  receive  filter  gRx(t).  Thus,  at  the  output  of  the  receive  filter,  we  have  the  linearly 
modulated  signal  J2nb[n]p(t  —  nT),  where  p{t)  =  {grx  *  gc  *  gnx)(t)  is  the  cascade  of  the 
transmit,  channel  and  receive  filters.  We  would  like  the  pulse  p(t)  to  be  Nyquist  at  rate  1/T,  so 
that,  in  the  absence  of  noise,  the  symbol  rate  samples  at  the  output  of  the  receive  filter  equal 
the  transmitted  symbols.  Of  course,  in  practice,  we  do  not  have  control  over  the  channel,  hence 
we  often  assume  an  ideal  channel,  and  design  such  that  the  cascade  of  the  transmit  and  receive 
filter,  given  by  (grx  *  gRx)  ( t)Grx(f)GRx(f )  is  Nyquist.  One  possible  choice  is  to  set  Gpx  to 
be  a  Nyquist  pulse,  and  Grx  to  be  a  wideband  filter  whose  response  is  flat  over  the  band  of 
interest.  Another  choice  that  is  even  more  popular  is  to  set  Gpx(f)  and  Grx  if)  to  be  square 
roots  of  a  Nyquist  pulse.  In  particular,  the  square  root  raised  cosine  (SRRC)  pulse  is  often  used 
in  practice. 

A  framework  for  software  simulations  of  linear  modulated  systems  with  raised  cosine  and  SRRC 
pulses,  including  Matlab  code  fragments,  is  provided  in  the  appendix,  and  provides  a  foundation 
for  Software  Lab  4.1. 

Square  root  Nyquist  pulses  and  their  time  domain  interpretation:  A  pulse  g(t)  -H-  G(f) 
is  defined  to  be  square  root  Nyquist  at  rate  1/T  if  |G(/)|2  is  Nyquist  at  rate  1/T.  Note  that 
P(f)  =  |G(/)|2  -H-  p{t)  =  (g  *  gMF)(t ),  where  gMF(t)  =  g*(~t).  The  time  domain  Nyquist 
condition  is  given  by 


p{mT)  =  (g  *  gMF)(mT)  =  j  g(t)g*(t  -  mT)dt  =  5m0  (4.20) 

That  is,  a  square  root  Nyquist  pulse  has  an  autocorrelation  function  that  vanishes  at  nonzero 
integer  multiples  of  T.  In  other  words,  the  waveforms  {g{t  —  kT,  k  =  0,  ±1,  ±2, ...}  are  orthonor¬ 
mal,  and  can  be  used  to  provide  a  basis  for  constructing  more  complex  waveforms,  as  we  see  in 
Section  4.3.6. 

Food  for  thought:  True  or  False?  Any  pulse  timelimited  to  [0,  T]  is  square  root  Nyquist  at 
rate  1/T. 
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4.3.6  Linear  modulation  as  a  building  block 

Linear  modulation  can  be  used  as  a  building  block  for  constructing  more  sophisticated  waveforms, 
using  discrete-time  sequences  modulated  by  square  root  Nyquist  pulses.  Thus,  one  symbol  would 
be  made  up  of  multiple  “chips,”  linearly  modulated  by  a  square  root  Nyquist  “chip  waveform.” 
Specifically,  suppose  that  ip(t)  is  square  root  Nyquist  at  a  chip  rate  jr.  N  chips  make  up 
one  symbol,  so  that  the  symbol  rate  is  ^-  =  and  a  symbol  waveform  is  given  by  linearly 
modulating  a  code  vector  s  =  (s[0], ...,  s[N  —  1])  consisting  of  N  chips,  as  follows: 

N 

s(t)  =  22  s[k]ip(t  -  kTc ) 

k= 0 

Since  {ip(t  —  kTc )}  are  orthonormal  (see  (4.20)),  we  have  simply  expressed  the  code  vector  in  a 
continuous  time  basis.  Thus,  the  continuous  time  inner  product  between  two  symbol  waveforms 
(which  determines  their  geometric  relationships  and  their  performance  in  noise,  as  we  see  in 
the  next  chapter)  is  equal  to  the  discrete  time  inner  product  between  the  corresponding  code 
vectors.  Specifically,  suppose  that  s\(t)  and  S2(t)  are  two  symbol  waveforms  corresponding  to 
code  vectors  Si  and  s2,  respectively.  Then  their  inner  product  satisfies 

TV— 1  TV— 1  „  N-l 

(si,  s2)  =  22  ^2  Sl[fc]s2^]  /  -  kTc)jp*(t  -  lTc)dt  =  22  Si [k]s*2[k\  =  (si,  s2) 

k= 0  1=0  k= 0 

where  we  have  use  the  orthonormality  of  the  translates  {ip (t  —  kTc)}.  This  means  that  we  can 
design  discrete  time  code  vectors  to  have  certain  desired  properties,  and  then  linearly  modulate 
square  root  Nyquist  chip  waveforms  to  get  symbol  waveforms  that  have  the  same  desired  prop¬ 
erties.  For  example,  if  Si  and  s2  are  orthogonal,  then  so  are  Si(t)  and  s2(t);  we  use  this  in  the 
next  section  when  we  discuss  orthogonal  modulation. 

Examples  of  square  root  Nyquist  chip  waveforms  include  a  rectangular  pulse  timelimited  to  an 
interval  of  length  Tc  ,  as  well  as  bandlimited  pulses  such  as  the  square  root  raised  cosine.  From 
Theorem  4.2.1,  we  see  that  the  PSD  of  the  modulated  waveform  is  proportional  to  |T(/)|2  (it  is 
typically  a  good  approximation  to  assume  that  the  chips  {s[fc]}  are  uncorrelated).  That  is,  the 
bandwidth  occupancy  is  determined  by  that  of  the  chip  waveform  ip. 


4.4  Orthogonal  and  Biorthogonal  Modulation 

While  linear  modulation  with  larger  and  larger  constellations  is  a  means  of  increasing  bandwidth 
efficiency,  we  shall  see  that  orthogonal  modulation  with  larger  and  larger  constellations  is  a 
means  of  increasing  power  efficiency  (at  the  cost  of  making  the  bandwidth  efficiency  smaller). 
Consider  first  M- ary  frequency  shift  keying  (FSK),  a  classical  form  of  orthogonal  modulation  in 
which  one  of  M  sinusoidal  tones,  successively  spaced  by  A/,  are  transmitted  every  T  units  of 
time,  where  4  is  the  symbol  rate.  Thus,  the  bit  rate  is  losj7u,  and  for  a  typical  symbol  interval, 
the  transmitted  passband  signal  is  chosen  from  one  of  M  possibilities: 

uPtk(t)  =  cos  (2tt(/0  +  kAf)t)  ,  0  <  t  <  T,  k  —  0, 1, ...,  M  -»  1 

where  we  typically  have  /o>^.  Taking  /o  as  reference,  the  corresponding  complex  baseband 
waveforms  are 

Uk(t)  =  exp  (j2nkAft)  ,  0  <  t  <T,  k  —  0, 1, ...,  M  —  1 
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Let  us  now  understand  how  the  tones  should  be  chosen  in  order  to  ensure  orthogonality.  Recall 
that  the  passband  and  complex  baseband  inner  products  are  related  as  follows: 

(■ uP,k,uPti )  =  e(uk,ui) 

so  we  can  develop  criteria  for  orthogonality  working  in  complex  baseband.  Setting  k  =  l,  we  see 
that 

INI2  =  T 

For  two  adjacent  tones,  l  =  k  +  1,  we  leave  it  as  an  exercise  to  show  that 

sin  2nAfT 
2vrA  / 

We  see  that  the  minimum  value  of  A /  for  which  the  preceding  quantity  is  zero  is  given  by 
2ttA/T  =  7r,  or  A /  =  A.. 

Thus,  from  the  point  of  view  of  the  receiver,  a  tone  spacing  of  A  ensures  that  when  there  is  an 
incoming  wave  at  the  kth  tone,  then  correlating  against  the  kth  tone  will  give  a  large  output,  but 
correlating  against  the  (k  +  l)th  tone  will  give  zero  output  (in  the  absence  of  noise).  However, 
this  assumes  a  coherent  system  in  which  the  tones  we  are  correlating  against  are  synchronized  in 
phase  with  the  incoming  wave.  What  happens  if  they  are  90°  out  of  phase?  Then  correlation  of 
the  fcth  tone  with  itself  yields 

cos  (27t(/0  +  kAf)t)  cos  (2,1 r(/0  +  kAf)t  +  —  j  dt  =  0 

(by  orthogonality  of  the  cosine  and  sine),  so  that  the  output  we  desire  to  be  large  is  actually 
zero!  Robustness  to  such  variations  can  be  obtained  by  employing  noncoherent  reception,  which 
we  describe  next. 

Noncoherent  reception:  Let  us  develop  the  concept  of  noncoherent  reception  in  generality, 
because  it  is  a  concept  that  is  useful  in  many  settings,  not  just  for  orthogonal  modulation.  Sup¬ 
pose  that  we  transmit  a  passband  waveform,  and  wish  to  detect  it  at  the  receiver  by  correlating 
it  against  the  receiver’s  copy  of  the  waveform.  However,  the  receiver’s  local  oscillator  may  not 
be  synchronized  in  phase  with  the  phase  of  the  incoming  wave.  Let  us  denote  the  receiver’s  copy 
of  the  signal  as 

up{t)  =  uc(t)  cos  2nfct  —  us(t )  sm2nfct 
and  the  incoming  passband  signal  as 


yP(t)  =  yc(t)  cos  2nfct  -  ys(t)  sin  2irfct  =  uc(t )  cos  ( 2n fct  +  9)  -  us(t )  sin  (2tt  fct  +  9) 

Using  the  receiver’s  local  oscillator  as  reference,  the  complex  envelope  of  the  receiver’s  copy  is 
u{t)  =  uc  +  jus(t),  while  that  of  the  incoming  wave  is  y{t)  =  u(t)e>6 .  Thus,  the  inner  product 

{yP,uP)  =  -R  e(y,u)  =  -R  e(uej9,u)  =  -Re(||«||Ve)  =  — ^cos6> 

Thus,  the  output  of  the  correlator  is  degraded  by  the  factor  cos  9,  and  can  actually  become  zero, 
as  we  have  already  observed,  if  the  phase  offset  9  —  n/2.  In  order  to  get  around  this  problem, 
let  us  look  at  the  complex  baseband  inner  product  again: 

{y,u)  =  (ue^6  ,u)  =  ej6,||u||2 
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We  could  ensure  that  this  output  remains  large  regardless  of  the  value  of  6  if  we  took  its  magni¬ 
tude,  rather  than  the  real  part.  Thus,  noncoherent  reception  corresponds  to  computing  \(y,u)\ 
or  \(y,u)\2.  Let  us  unwrap  the  complex  inner  product  to  see  what  this  entails: 


{y,u)  =  =  J (yc(t)+jy,(t))(ue(t)-ju,(t))dt  =  +  (y„u,))+j  «y„uc)  -  (yc,u,)) 

Thus,  the  noncoherent  receiver  computes  the  quantity 

I (2/5 u)\2  =  (( yc,uc )  +  ( ys,us)f  +  (( ys,uc )  -  ( yc,us ))2 
In  contrast,  the  coherent  receiver  computes 

Re(y,u)  =  (yc,uc)  +  ( y8,u8 ) 

That  is,  when  the  receiver  LO  is  synchronized  to  the  phase  of  the  incoming  wave,  we  can  correlate 
the  I  component  of  the  received  waveform  with  the  I  component  of  the  receiver’s  copy,  and 
similarly  correlate  the  Q  components,  and  sum  them  up.  However,  in  the  presence  of  phase 
asynchrony,  the  I  and  Q  components  get  mixed  up,  and  we  must  compute  the  magnitude  of  the 
complex  inner  product  to  recover  all  the  energy  of  the  incoming  wave.  Figure  4.17  shows  the 
receiver  operations  corresponding  to  coherent  and  noncoherent  reception. 


Figure  4.17:  Structure  of  coherent  and  noncoherent  receivers. 


Back  to  FSK:  Going  back  to  FSK,  if  we  now  use  noncoherent  reception,  then  in  order  to 
ensure  that  we  get  a  zero  output  (in  the  absence  of  noise)  when  receiving  the  Tcth  tone  with  a 
noncoherent  receiver  for  the  ( k  +  l)th  tone,  we  must  ensure  that 

|  ("Ufc,  Mfc-j-i)  |  0 

We  leave  it  as  an  exercise  (Problem  4.18)  to  show  that  the  minimum  tone  spacing  for  noncoherent 
FSK  is  ^  which  is  double  that  required  for  orthogonality  in  coherent  FSK.  The  bandwidth  for 
coherent  M- ary  FSK  is  approximately  which  corresponds  to  a  time-bandwidth  product  of 
approximately  4f.  This  corresponds  to  a  complex  vector  space  of  dimension  ,  or  a  real  vector 
space  of  dimension  M,  in  which  we  can  fit  M  orthogonal  signals.  On  the  other  hand,  M-ary 
noncoherent  signaling  requires  M  complex  dimensions,  since  the  complex  baseband  signals  must 
remain  orthogonal  even  under  multiplication  by  complex-valued  scalars. 

Summarizing  the  concept  of  orthogonality:  To  summarize,  when  we  say  “orthogonal” 
modulation,  we  must  specify  whether  we  mean  coherent  or  noncoherent  reception,  because  the 
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concept  of  orthogonality  is  different  in  the  two  cases.  For  a  signal  set  {sfc(t)},  orthogonality 
requires  that,  for  k  ^  l,  we  have 


Re((sfc,s/))  =  0  coherent  orthogonality  criterion 
(sk,si)  =  0  noncoherent  orthogonality  criterion 


(4.21) 


Bandwidth  efficiency:  We  conclude  from  the  example  of  orthogonal  FSK  that  the  bandwidth 
efficiency  of  orthogonal  signaling  is  rjs  =  lug,2^AI  1  bits/complex  dimension  for  coherent  systems, 

and  t]b  =  log^/A/->  bits/complex  dimension  for  noncoherent  systems.  This  is  a  general  observation 
that  holds  for  any  realization  of  orthogonal  signaling.  In  a  signal  space  of  complex  dimension 
D  (and  hence  real  dimension  2D),  we  can  fit  2D  signals  satisfying  the  coherent  orthogonality 
criterion,  but  only  D  signals  satisfying  the  noncoherent  orthogonality  criterion.  As  M  gets  large, 
the  bandwidth  efficiency  tends  to  zero.  In  compensation,  as  we  see  in  Chapter  6,  the  power 
efficiency  of  orthogonal  signaling  for  large  M  is  the  “best  possible.” 

Orthogonal  Walsh- Hadamard  codes 

Section  4.3.6  shows  how  to  map  vectors  to  waveforms  while  preserving  inner  products,  by  using 
linear  modulation  with  a  square  root  Nyquist  chip  waveform.  Applying  this  construction,  the 
problem  of  designing  orthogonal  waveforms  {s*}  now  reduces  to  designing  orthogonal  code  vectors 
{sj}.  Walsh- Hadamard  codes  are  a  standard  construction  employed  for  this  purpose,  and  can 
be  constructed  recursively  as  follows:  at  the  nth  stage,  we  generate  2n  orthogonal  vectors,  using 
the  2n_1  vectors  constructed  in  the  n  —  1  stage.  Let  H„  denote  a  matrix  whose  rows  are  2n 
orthogonal  codes  obtained  after  the  nth  stage,  with  H0  =  (1).  Then 

TT  _  f  Hn_!  \ 

n  “  V  H»-l  Hn— 1  J 

We  therefore  get 


/II  1  1  \ 


V  1  -1  -1  1  / 


Figure  4.18  depicts  the  waveforms  corresponding  to  the  4-ary  signal  set  in  H2  using  a  rectangular 
timelimited  chip  waveform  to  go  from  sequences  to  signals,  as  described  in  Section  4.3.6. 

The  signals  {sj}  obtained  above  can  be  used  for  noncoherent  orthogonal  signaling,  since  they 
satisfy  the  orthogonality  criterion  ( Si,Sj )  =  0  for  i  ^  j.  However,  just  as  for  FSK,  we  can 
fit  twice  as  many  signals  into  the  same  number  of  degrees  of  freedom  if  we  used  the  weaker 
notion  of  orthogonality  required  for  coherent  signaling,  namely  Re((s,,Sj)  =  0  for  j  ^  j.  It 
is  easy  to  check  that  for  M- ary  Walsh-Hadamard  signals  {s*,i  =  1  ,...,M},  we  can  get  2 M 
orthogonal  signals  for  coherent  signaling:  {si,jsi,i  =  1  ,...,M}.  This  construction  corresponds 
to  independently  modulating  the  I  and  Q  components  with  a  Walsh-Hadamard  code;  that  is, 
using  passband  waveforms  Si(t)  cos27r fct  and  — Sj(t)  sin  2irfct  (the  negative  sign  is  only  to  conform 
to  our  convention  for  I  and  Q,  and  can  be  dropped,  which  corresponds  to  replacing  jsi  by  —  jsi 
in  complex  baseband),  i  =  1, ...,  M. 

Biorthogonal  modulation 

Given  an  orthogonal  signal  set,  a  biorthogonal  signal  set  of  twice  the  size  can  be  obtained  by 
including  a  negated  copy  of  each  signal.  Since  signals  s  and  —  s  cannot  be  distinguished  in  a 
noncoherent  system,  biorthogonal  signaling  is  applicable  to  coherent  systems.  Thus,  for  an  M- ary 
Walsh-Hadamard  signal  set  {sj}  with  M  signals  obeying  the  noncoherent  orthogonality  criterion, 
we  can  construct  a  coherent  orthogonal  signal  set  js^  of  size  2 M,  and  hence  a  biorthogonal 
signal  set  of  size  AM,  e.g.,  {s*,  js^  —Si,  —jsi}.  These  correspond  to  the  AM  passband  waveforms 
±Si(£)  cos  27 ifct  and  ±Sj(£)  sin  27t/c£,  i  =  1,  ...,M. 
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Figure  4.18:  Walsh-Hadamard  codes  for  4-ary  orthogonal  modulation. 


4.5  Proofs  of  the  Nyquist  theorems 

We  have  used  Nyquist’s  sampling  theorem,  Theorem  4.3.1,  to  argue  that  linear  modulation 
using  the  sine  pulse  is  able  to  use  all  the  degrees  of  freedom  in  a  bandlimited  channel.  On 
the  other  hand,  Nyquist’s  criterion  for  ISI  avoidance,  Theorem  4.3.2,  tells  us,  roughly  speaking, 
that  we  must  have  enough  degrees  of  freedom  in  order  to  avoid  ISI  (and  that  the  sine  pulse 
provides  the  minimum  such  degrees  of  freedom).  As  it  turns  out,  both  theorems  are  based  on 
the  same  mathematical  relationship  between  samples  in  the  time  domain  and  aliased  spectra  in 
the  frequency  domain,  stated  in  the  following  theorem. 

Theorem  4.5.1  (Sampling  and  Aliasing):  Consider  a  signal  s(t),  sampled  at  rate  A. 

S(f )  denote  the  spectrum  of  s{t),  and  let 

1  _°°  u 

B(f)  =  -  Y,  S(/  +  =-)  (4.22) 

-*■  S  J  J-  s 

k=—oo 

denote  the  sum  of  translates  of  the  spectrum.  Then  the  following  observations  hold: 

(a)  B(f )  is  periodic  with  period  A. 

(b)  The  samples  {s(nTs)}  are  the  Fourier  series  for  B(f) ,  satisfying 

s[nTs)  ~  Ts  I  ^  B(f)ej2*fnTs  df  (4.23) 

CO 

B(f)  =  s{nTs)e~j2nfnTs  (4.24) 

n=— oo 


Remark:  Note  that  the  signs  of  the  exponents  for  the  frequency  domain  Fourier  series  in  the 
theorem  are  reversed  from  the  convention  in  the  usual  time  domain  Fourier  series  (analogous  to 
the  reversal  of  the  sign  of  the  exponent  for  the  inverse  Fourier  transform  compared  to  the  Fourier 
transform). 
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Proof  of  Theorem  4.5.1:  The  periodicity  of  B(f)  follows  by  its  very  construction.  To  prove 
(b),  apply  the  the  inverse  Fourier  transform  to  obtain 

/OO 

5(/)e»'2  *fnT‘df 

■oo 


We  now  write  the  integral  as  an  infinite  sum  of  integrals  over  segments  of  length  1/T 


.  fc+2 


<nT-)  =  V  t;  S(f)e 


j2irfnTs 


df 


k=— OO  Ts 


In  the  integral  over  the  kth  segment,  make  the  substitution  v  =  /  —  jr  and  rewrite  it  as 

S(v  +  —)el2l,{u+’k)nT-dv=  S(i>  +  —)e?2mnT‘dv 

*  T‘  J-fc  T‘ 

Now  that  the  limits  of  all  segments  and  the  complex  exponential  in  the  integrand  are  the  same 
(i.e. ,  independent  of  k),  we  can  move  the  summation  inside  to  obtain 

S(nT,)  =  f*_  (£“ -»  S(u  +  A))  e^-du 

2  Ts  \  s  / 

=  Ts  B(v)ej2™nT°dv 

2  Ts 

proving  (4.23).  We  can  now  recognize  that  this  is  just  the  formula  for  the  Fourier  series  coefficients 
of  £>(/),  from  which  (4.24)  follows.  □ 


l/T. 


S(f  —  1/TS  ) 


■»f 


S(f  +  1/TS  ) 


1/TS 

s(f-i/rs ) 


w 


w 


Sampling  rate  not  high  enough 
to  recover  S(f)  from  B(f) 


Sampling  rate  high  enough 
to  recover  S(f)  from  B(f) 


Figure  4.19:  Recovering  a  signal  from  its  samples  requires  a  high  enough  sampling  rate  for 
translates  of  the  spectrum  not  to  overlap. 


Inferring  Nyquist’s  sampling  theorem  from  Theorem  4.5.1:  Suppose  that  s(t)  is  ban- 
dlimited  to  [—  11],  The  samples  of  s(t)  at  rate  jr  can  be  used  to  reconstruct  B(f),  since  they 

are  the  Fourier  series  for  B(f).  But  S(f)  can  be  recovered  from  B(f)  if  and  only  if  the  translates 
S(f  —  yr)  do  not  overlap,  as  shown  in  Figure  4.19.  This  happens  if  and  only  if  Y  >  W.  Once 
this  condition  is  satisfied,  jrS(f)  can  be  recovered  from  B(f)  by  passing  it  through  an  ideal 
bandlimited  filter  H(f)  =  I[-w/2.w/2](f)-  We  therefore  obtain  that 

CO 

-S(/)  =  BU)H(f)  =  y  s(nT,)e-12’fnT-I[-w/2.w/2]U)  (4.25) 

n=—oo 

Noting  that  I[-w/2.w/2](f)  ^  VFsinc(VFi),  we  have 

e~j2*fnT- sI[-W/2.W/2](f)  O  W sine  (W(t  -  nTs )) 
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Taking  inverse  Fourier  transforms,  we  get  the  interpolation  formula 


-~j  LaJ 

=  Y  s(nTs)Wsmc  ( W{t  -  nTs )) 

-L  s 

n=— oo 


which  reduces  to  (4.15)  for  4-  =  W.  This  completes  the  proof  of  the  sampling  theorem,  Theorem 
4.3.1.  S  □ 


Inferring  Nyquist’s  criterion  for  ISI  avoidance  from  Theorem  4.5.1:  A  Nyquist  pulse 
p{t)  at  rate  1/T  must  satisfy  p(nT )  =  5n0.  Applying  Theorem  4.5.1  with  s(£)  =  p(t)  and  Ts  =  T, 
it  follows  immediately  from  (4.24)  that  p(nT)  =  6n0  (i.e.,  the  time  domain  Nyquist  criterion 
holds)  if  and  only  if 


1  _°°  u 

B(f )  =  =  ]T  P(f  +  ¥) 
1 - S 


1 


In  other  words,  if  the  Fourier  series  only  has  a  DC  term,  then  the  periodic  waveform  it  corresponds 
to  must  be  constant.  □ 


4.6  Concept  Summary 

This  chapter  provides  an  introduction  to  how  bits  can  be  translated  to  information-carrying 
signals  which  satisfy  certain  constraints  (e.g.,  fitting  within  a  given  frequency  band).  We  focus 
on  linear  modulation  over  passband  channels. 

Modulation  basics 

•  Information  bits  can  be  encoded  into  two-dimensional  (complex-valued)  constellations,  which 
can  be  modulated  onto  baseband  pulses  to  produce  a  complex  baseband  waveform.  Constella¬ 
tions  may  carry  information  in  both  amplitude  and  phase  (e.g.,  QAM)  or  in  phase  only  (e.g., 
PSK).  This  modulated  waveform  can  then  be  upconverted  to  the  appropriate  frequency  band  for 
passband  signaling. 

•  The  PSD  of  a  linearly  modulated  waveform  using  pulse  p(t)  is  proportional  to  |P(/)|2,  so  that 
the  choice  of  modulating  pulse  is  critical  for  determining  bandwidth  occupancy.  Fractional  power 
containment  provides  a  useful  notion  of  bandwidth. 

•  Time  limited  pulses  with  sharp  edges  have  large  bandwidth,  but  this  can  be  reduced  by  smooth¬ 
ing  out  the  edges  (e.g.,  by  replacing  a  rectangular  pulse  with  a  trapezoidal  pulse  or  by  a  sinusoidal 
pulse). 

Degrees  of  freedom 

•  Nyquist’s  sampling  theorem  says  that  a  signal  bandlimited  over  [— IT/2,  W/2]  is  completely 
characterized  by  its  samples  at  rate  W  (or  higher).  Applying  this  to  the  complex  envelope  of  a 
passband  signal  of  bandwidth  W,  we  infer  that  a  passband  channel  of  bandwidth  W  provides  W 
complex-valued  degrees  of  freedom  per  unit  time  for  carrying  information. 

•  The  (time  domain)  sine  pulse,  which  corresponds  to  a  frequency  domain  boxcar,  allows  us  to 
utilize  all  degrees  of  freedom  in  a  bandlimited  channel,  but  it  decays  too  slowly,  at  rate  1/t,  for 
practical  use:  it  can  lead  to  unbounded  signal  amplitude  and,  in  the  presence  of  timing  mismatch, 
unbounded  ISI. 

ISI  avoidance 

•  The  Nyquist  criterion  for  ISI  avoidance  requires  that  the  end-to-end  signaling  pulse  vanish 
at  nonzero  integer  multiples  of  the  symbol  time.  In  the  frequency  domain,  this  corresponds  to 
aliased  versions  of  the  pulse  summing  to  a  constant. 

•  The  sine  pulse  is  the  minimum  bandwidth  Nyquist  pulse,  but  decays  too  slowly  with  time.  It 
can  be  replaced,  at  the  expense  of  some  excess  bandwidth,  by  pulses  with  less  sharp  transitions 
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in  the  frequency  domain  to  obtain  faster  decay  in  time.  The  raised  cosine  pulse  is  a  popular 
choice,  giving  a  1  /t3  decay. 

•  If  the  receive  filter  is  matched  to  the  transmit  filter,  each  has  to  be  a  square  root  Nyquist  pulse, 
with  their  cascade  being  Nyquist.  The  SRRC  is  a  popular  choice. 

Power-bandwidth  tradeoffs 

•  For  an  M-ary  constellation,  the  bandwidth  efficiency  is  log2  M  bits  per  symbol,  so  that  larger 
constellations  are  more  bandwidth-efficient. 

•  The  power  efficiency  for  a  constellation  is  well  characterized  by  the  scale- invariant  quantity 
d^mn/Eij.  Large  constellations  are  typically  less  power-efficient. 

Beyond  linear  modulation 

•  Linear  modulation  using  square  root  Nyquist  pulses  can  be  used  to  translate  signal  design  from 
discrete  time  to  continuous  time  while  preserving  geometric  relationships  such  as  inner  products. 
This  is  because,  if  ij)(t)  is  square  root  Nyquist  at  rate  1/TC,  then  {ip(t  —  kTc )},  its  translates  by 
integer  multiples  of  Tc,  form  an  orthonormal  basis. 

•  Orthogonal  modulation  can  be  used  with  either  coherent  or  noncoherent  reception,  but  the 
concept  of  orthogonality  is  more  stringent  (eating  up  more  degrees  of  freedom)  for  noncoherent 
orthogonal  signaling.  Waveforms  for  orthogonal  modulation  can  be  constructed  in  a  variety 
of  ways,  including  FSK  and  Walsh-Hadamard  sequences  modulated  onto  square  root  Nyquist 
pulses.  Biorthogonal  signaling  doubles  the  signaling  alphabet  for  coherent  orthogonal  signaling 
by  adding  the  negative  of  each  signal  to  the  constellation. 

Sampling  and  aliasing 

•  Time  domain  sampling  corresponds  to  frequency  domain  aliasing.  Specifically,  the  samples  of 

a  waveform  x(t)  at  rate  1/T  are  the  Fourier  series  for  the  periodic  frequency  domain  waveform 
^  ~  k/T)  obtained  by  summing  the  frequency  domain  waveform  and  its  aliases  X(f  — 

k/T )  (k  integer). 

•  The  Nyquist  sampling  theorem  corresponds  to  requiring  that  the  aliased  copies  are  far  enough 
apart  (i.e. ,  the  sampling  rate  is  high  enough)  that  we  can  recover  the  original  frequency  domain 
waveform  by  filtering  the  sum  of  the  aliased  waveforms. 

•  The  Nyquist  criterion  for  interference  avoidance  requires  that  the  samples  of  the  signaling 
pulse  form  a  discrete  delta  function,  or  that  the  corresponding  sum  of  the  aliased  waveforms  is 
a  constant. 


4.7  Endnotes 

While  we  use  linear  modulation  in  the  time  domain  for  our  introduction  to  modulation,  an 
alternative  frequency  domain  approach  is  to  divide  the  available  bandwidth  into  thin  slices,  or 
subcarriers,  and  to  transmit  symbols  in  parallel  on  each  subcarrier.  Such  a  strategy  is  termed 
Orthogonal  Frequency  Division  Multiplexing  (OFDM)  or  multicarrier  modulation,  and  we  discuss 
it  in  more  detail  in  Chapter  8.  OFDM  is  also  termed  multicarrier  modulation,  while  the  time 
domain  linear  modulation  schemes  covered  here  are  classified  as  singlecarrier  modulation.  In 
addition  to  the  degrees  of  freedom  provided  by  time  and  frequency,  additional  spatial  degrees  of 
freedom  can  be  obtained  by  employing  multiple  antennas  at  the  transmitter  and  receiver,  and 
we  provide  a  glimpse  of  such  Multiple  Input  Multiple  Output  (MIMO)  techniques  in  Chapter  8. 

While  the  basic  linear  modulation  strategies  discussed  here,  in  either  singlecarrier  or  multicarrier 
modulation  formats,  are  employed  in  many  existing  and  emerging  communication  systems,  it  is 
worth  mentioning  a  number  of  other  strategies  in  which  modulation  with  memory  is  used  to  shape 
the  transmitted  waveform  in  various  ways,  including  insertion  of  spectral  nulls  (e.g.,  line  codes, 
often  used  for  baseband  wireline  transmission),  avoidance  of  long  runs  of  zeros  and  ones  which 
can  disrupt  synchronization  (e.g.,  runlength  constrained  codes,  often  used  for  magnetic  recording 
channels),  controlling  variations  in  the  signal  envelope  (e.g.,  constant  phase  modulation),  and 
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controlling  ISI  (e.g.,  partial  response  signaling).  Memory  can  also  be  inserted  in  the  manner 
that  bits  are  encoded  into  symbols  (e.g.,  differential  encoding  for  alleviating  the  need  to  track 
a  time- varying  channel),  without  changing  the  basic  linear  modulation  format.  The  preceding 
discussion,  while  not  containing  enough  detail  to  convey  the  underlying  concepts,  is  meant  to 
provide  keywords  to  facilitate  further  exploration,  with  more  advanced  communication  theory 
texts  such  as  [5,  7,  8]  serving  as  a  good  starting  point. 


Problems 


Timelimited  pulses 

Problem  4.1  (Sine  pulse)  Consider  the  sine  pulse  pulse  p(t)  =  smntL0^(t). 

(a)  Show  that  its  Fourier  transform  is  given  by 

=  2  cos(tt/)  e~j7rf 
[J>  *(1-4/*) 

(b)  Consider  the  linearly  modulated  signal  u(t)  =  ]T)n  b[n\p(t  —  n ),  where  b[n]  are  independently 
chosen  to  take  values  in  a  QPSK  constellation  (each  point  chosen  with  equal  probability),  and 
the  unit  of  time  is  in  microseconds.  Find  the  95%  power  containment  bandwidth  (specify  the 
units). 

Problem  4.2  Consider  the  pulse 

0  <  t  <  a, 
a  <  t  <  1  —  a, 
l-a<t<l, 
else. 


pit)  = 


where  0  <  a  < 

(a)  Sketch  p{t)  and  find  its  Fourier  transform  P(/). 

(b)  Consider  the  linearly  modulated  signal  u{t)  =  ^2nb[n\p(t  —  n),  where  b[n]  take  values  inde¬ 
pendently  and  with  equal  probability  in  a  4-PAM  alphabet  {±1,  ±3}.  Find  an  expression  for  the 
PSD  of  u  as  a  function  of  the  pulse  shape  parameter  a. 

(c)  Numerically  estimate  the  95%  fractional  power  containment  bandwidth  for  u  and  plot  it  as  a 
function  of  0  —  a  —  For  concreteness,  assume  the  unit  of  time  is  100  picoseconds  and  specify 
the  units  of  bandwidth  in  your  plot. 


Basic  concepts  in  Nyquist  signaling 

Problem  4.3  Consider  a  pulse  s(t)  =  sine  (at)  sine  (bt),  where  a  >  b. 

(a)  Sketch  the  frequency  domain  response  S(f)  of  the  pulse. 

(b)  Suppose  that  the  pulse  is  to  be  used  over  an  ideal  real  baseband  channel  with  one-sided 
bandwidth  400  Hz.  Choose  a  and  b  so  that  the  pulse  is  Nyquist  for  4-PAM  signaling  at  1200 
bits/sec  and  exactly  fills  the  channel  bandwidth. 

(c)  Now,  suppose  that  the  pulse  is  to  be  used  over  a  passband  channel  spanning  the  frequencies 
2.4-2.42  GHz.  Assuming  that  we  use  64-QAM  signaling  at  60  Mbits/sec,  choose  a  and  b  so  that 
the  pulse  is  Nyquist  and  exactly  fills  the  channel  bandwidth. 

(d)  Sketch  an  argument  showing  that  the  magnitude  of  the  transmitted  waveform  in  the  preceding 
settings  is  always  finite. 
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Problem  4.4  Consider  the  pulse  p(t)  whose  Fourier  transform  satisfies: 


'  1,  0<\f\<A 

p(f)  =  |  a<\s\<b 

0,  else 


where  A  =  250 KHz  and  B  =  1.25 MHz. 

(a)  True  or  False  The  pulse  p{t)  can  be  used  for  Nyquist  signaling  at  rate  3  Mbps  using  an 
8-PSK  constellation. 

(b)  True  or  False  The  pulse  p(t)  can  be  used  for  Nyquist  signaling  at  rate  4.5  Mbps  using  an 
8-PSK  constellation. 


Problem  4.5  Consider  the  pulse 

f  1  0<  \t\  <T 

P(t)  =  l 

[  0,  else 

Let  P(f)  denote  the  Fourier  transform  of  pit). 

(a)  True  or  False  The  pulse  p(t)  is  Nyquist  at  rate 

(b)  True  or  False  The  pulse  p[t)  is  square  root  Nyquist  at  rate  (i.e.,  |P(/)|2  is  Nyquist  at 
rate  A). 


P(f) 


Problem  4.6  Consider  Nyquist  signaling  at  80  Mbps  using  a  16QAM  constellation  with  50% 
excess  bandwidth.  The  signaling  pulse  has  spectrum  shown  in  Figure  4.20. 

(a)  Find  the  values  of  a  and  b  in  the  figure,  making  sure  you  specify  the  units. 

(b)  True  or  False  The  pulse  is  also  Nyquist  for  signaling  at  20  Mbps  using  QPSK.  (Justify  your 
answer.) 

Problem  4.7  Consider  linear  modulation  with  a  signaling  pulse  p(t)  =  sine  (at)  sine  (bt),  where 
a  and  b  are  to  be  determined. 

(a)  How  should  a  and  b  be  chosen  so  that  p[t)  is  Nyquist  with  50%  excess  bandwidth  for  a  data 
rate  of  40  Mbps  using  16QAM?  Specify  the  occupied  bandwidth. 

(b)  How  should  a  and  b  be  chosen  so  that  p[t)  can  be  used  for  Nyquist  signaling  both  for  a 
16QAM  system  with  40  Mbps  data  rate,  and  for  an  8PSK  system  with  18  Mbps  data  rate? 
Specify  the  occupied  bandwidth. 
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Problem  4.8  Consider  a  passband  communication  link  operating  at  a  bit  rate  of  16  Mbps  using 
a  256-QAM  constellation. 

(a)  What  must  we  set  the  unit  of  time  as  so  that  p[t)  =  sin7rt/[0)1](f)  is  square  root  Nyquist  for 
the  system  of  interest,  while  occupying  the  smallest  possible  bandwidth? 

(b)  What  must  we  set  the  unit  of  time  as  so  that  p{t)  =  sinc(£)sinc(2£)  is  Nyquist  for  the  system 
of  interest,  while  occupying  the  smallest  possible  bandwidth? 


Problem  4.9  Consider  passband  linear  modulation  with  a  pulse  of  the  form  p(t)  =  sinc(3f)sinc(2f), 
where  the  unit  of  time  is  microseconds. 

(a)  Sketch  the  spectrum  P(f)  versus  /.  Make  sure  you  specify  the  units  on  the  /  axis. 

(b)  What  is  the  largest  achievable  bit  rate  for  Nyquist  signaling  using  p(t)  if  we  employ  a  16QAM 
constellation?  What  is  the  fractional  excess  bandwidth  for  this  bit  rate? 

(c)  (True  or  False)  The  pulse  p(t)  can  be  used  for  Nyquist  signaling  at  a  bit  rate  of  4  Mbps 
using  a  QPSK  constellation. 

Problem  4.10  (True  or  False)  Any  pulse  timelimited  to  duration  T  is  square  root  Nyquist 
(up  to  scaling)  at  rate  1/T. 


Problem  4.11  (Raised  cosine  pulse)  In  this  problem,  we  derive  the  time  domain  response  of 
the  frequency  domain  raised  cosine  pulse.  Let  R(f)  =  /[_i  i](/)  denote  an  ideal  boxcar  transfer 
function,  and  let  C(f)  =  ^  cos(^/)/[_»j|]  denote  a  cosine  transfer  function. 

(a)  Sketch  R(f)  and  C(f),  assuming  that  0  <  a  <  1. 

(b)  Show  that  the  frequency  domain  raised  cosine  pulse  can  be  written  as 

S(f)  =  (R*  C)(f) 

(c)  Find  the  time  domain  pulse  s(t)  =  r(t)c(t).  Where  are  the  zeros  of  s(t)7  Conclude  that 
s(t/T)  is  Nyquist  at  rate  1/T. 

(d)  Sketch  an  argument  that  shows  that,  if  the  pulse  s(t/T )  is  used  for  BPSK  signaling  at  rate 
1/T,  then  the  magnitude  of  the  transmitted  waveform  is  always  finite. 


Software  experiments  with  Nyquist  and  square  root  Nyquist  pulses 

Problem  4.12  (Software  exercise  for  the  raised  cosine  pulse)  Code  fragment  4.B.1  in  the 

appendix  implements  a  discrete  time  truncated  raised  cosine  pulse. 

(a)  Run  the  code  fragment  for  25%,  50%  and  100%  excess  bandwidths  and  plot  the  time  domain 
waveforms  versus  normalized  time  t/T  over  the  interval  [— 5T,  5T],  sampling  fast  enough  (e.g., 
at  rate  32 /T  or  higher)  to  obtain  smooth  curves.  Comment  on  the  effect  of  varying  the  excess 
bandwidth  on  these  waveforms. 

(b)  For  excess  bandwidth  of  50%,  numerically  explore  the  effect  of  time  domain  truncation  on 
frequency  domain  spillage.  Specifically,  compute  the  Fourier  transform  for  two  cases:  truncation 
to  [— 2T,  2T]  and  truncation  to  [— 5T,  5T],  using  the  DFT  as  described  in  code  fragment  2.5.1  to 
obtain  a  frequency  resolution  at  least  as  good  as  Plot  these  Fourier  transforms  against  the 
normalized  frequency  /T,  and  comment  on  how  much  of  increase  in  bandwidth,  if  any,  you  see 
due  to  truncation  in  the  two  cases. 

(c)  Numerically  compute  the  95%  bandwidth  of  the  two  pulses  in  (b),  and  compare  it  with  the 
nominal  bandwidth  without  truncation. 

Problem  4.13  (Software  exercise  for  the  SRRC  pulse)  (a)  Write  a  function  for  generating 
a  sampled  SRRC  pulse,  analogous  to  code  fragment  4.B.1,  where  you  can  specify  the  sampling 
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rate,  the  excess  bandwidth,  and  the  truncation  length.  The  time  domain  expression  for  the 
SRRC  pulse  is  given  by  (4.45)  in  the  appendix. 

Remark:  The  zero  in  the  denominator  can  be  handled  either  by  analytical  or  numerical  imple¬ 
mentation  of  L’Hospital’s  rule.  See  comments  in  code  fragment  4.B.I. 

(b)  Plot  the  SRRC  pulses  versus  normalized  time  t/T,  for  excess  bandwidths  of  25%,  50%  and 
100%.  Comment  on  the  effect  of  varying  excess  bandwidth  on  these  waveforms. 

(c)  in  the  appendix  implements  a  discrete  time  truncated  raised  cosine  pulse. 

(a)  Run  the  code  fragment  for  25%,  50%  and  100%  excess  bandwidths  and  plot  the  time  domain 
waveforms  over  [— 5T,  5T],  sampling  fast  enough  (e.g.,  at  rate  32 /T  or  higher)  to  obtain  smooth 
curves.  Comment  on  the  effect  of  varying  the  excess  bandwidth  on  these  waveforms. 

(b)  For  excess  bandwidth  of  50%,  numerically  explore  the  effect  of  time  domain  truncation  on 
frequency  domain  spillage.  Specifically,  compute  the  Fourier  transform  for  two  cases:  truncation 
to  [—2 T,  2 T\  and  truncation  to  [— 5T,  5T],  using  the  DFT  as  described  in  code  fragment  2.5.1  to 
obtain  a  frequency  resolution  at  least  as  good  as  Plot  these  Fourier  transforms  against  the 
normalized  frequency  fT,  and  comment  on  how  much  of  increase  in  bandwidth,  if  any,  you  see 
due  to  truncation  in  the  two  cases. 

(c)  Numerically  compute  the  95%  bandwidth  of  the  two  pulses  in  (b),  and  compare  it  with  the 
nominal  bandwidth  without  truncation. 


Effect  of  timing  errors 


Problem  4.14  (Effect  of  timing  errors)  Consider  digital  modulation  at  rate  1/T  using  the 
sine  pulse  s(t)  =  sinc(2bFf),  with  transmitted  waveform 

100 

y(t)  =  ^2  bns(t  -  (n  -  1  )T) 

n=l 

where  1/T  is  the  symbol  rate  and  {5n}  is  the  bit  stream  being  sent  (assume  that  each  bn  takes 
one  of  the  values  ±1  with  equal  probability).  The  receiver  makes  bit  decisions  based  on  the 
samples  rn  =  y((n  —  1  )T),  n  =  1, ...,  100. 

(a)  For  what  value  of  T  (as  a  function  of  W )  is  rn  =  bn,  n  =  1, ...,  100? 

Remark:  In  this  case,  we  simply  use  the  sign  of  the  nth  sample  rn  as  an  estimate  of  bn. 

(b)  For  the  choice  of  T  as  in  (a),  suppose  that  the  receiver  sampling  times  are  off  by  .25  T.  That 
is,  the  nth  sample  is  given  by  rn  =  y((n  —  1  )T  +  .25 T),  n  =  1, ...,  100.  In  this  case,  we  do  have  ISI 
of  different  degrees  of  severity,  depending  on  the  bit  pattern.  Consider  the  following  bit  pattern: 


f  (—If'1  1  <  n  <  49 
bn  ~  \  (-l)n  50  <  n  <  100 

Numerically  evaluate  the  50th  sample  r50.  Does  it  have  the  same  sign  as  the  50th  bit  &50? 
Remark:  The  preceding  bit  pattern  creates  the  worst  possible  ISI  for  the  50th  bit.  Since  the  sine 
pulse  dies  off  slowly  with  time,  the  ISI  contributions  due  to  the  99  other  bits  to  the  50th  sample 
sum  up  to  a  number  larger  in  magnitude,  and  opposite  in  sign,  relative  to  the  contribution  due 
to  650.  A  decision  on  b50  based  on  the  sign  of  r50  would  therefore  be  wrong.  This  sensitivity  to 
timing  error  is  why  the  sine  pulse  is  seldom  used  in  practice. 

(c)  Now,  consider  the  digitally  modulated  signal  in  (a)  with  the  pulse  s(t )  =  sinc(2VFf)sinc(VFf). 
For  ideal  sampling  as  in  (a),  what  are  the  two  values  of  T  such  that  rn  =  bnl 

(d)  For  the  smaller  of  the  two  values  of  T  found  in  (c)  (which  corresponds  to  faster  signaling, 
since  the  symbol  rate  is  1/T),  repeat  the  computation  in  (b).  That  is,  find  r50  and  compare  its 
sign  with  b50  for  the  bit  pattern  in  (b). 

(e)  Find  and  sketch  the  frequency  response  of  the  pulse  in  (c).  What  is  the  excess  bandwidth 
relative  to  the  pulse  in  (a),  assuming  Nyquist  signaling  at  the  same  symbol  rate? 

(f)  Discuss  the  impact  of  the  excess  bandwidth  on  the  severity  of  the  ISI  due  to  timing  mismatch. 
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Figure  4.21:  16QAM  constellation  with  scaling  chosen  for  convenient  computation  of  power 
efficiency. 


Power-bandwidth  tradeoffs 

Problem  4.15  (Power  efficiency  of  16QAM)  In  this  problem,  we  sketch  the  computation 
of  power  efficiency  for  the  16QAM  constellation  shown  in  Figure  4.21. 

(a)  Note  that  the  minimum  distance  for  the  particular  scaling  chosen  in  the  figure  is  dmin  =  2. 

(b)  Show  that  the  constellation  points  divide  into  3  categories  based  on  their  distance  from  the 
origin,  corresponding  to  squared  distances,  or  energies,  of  l2  +  l2,  l2  +  32  and  32  +  32.  Averaging 
over  these  energies  (weighting  by  the  number  of  points  in  each  category),  show  that  the  average 
energy  per  symbol  is  Es  =  10. 

(c)  Using  (a)  and  (b),  and  accounting  for  the  number  of  bits/symbol,  show  that  the  power 

d?  o 

efficiency  is  given  by  r)p  =  =  |. 

Problem  4.16  (Power-bandwidth  tradeoffs)  A  16QAM  system  transmits  at  50  Mbps  using 
an  excess  bandwidth  of  50%.  The  transmit  power  is  100  mW. 

(a)  Assuming  that  the  carrier  frequency  is  5.2  GHz,  specify  the  frequency  interval  occupied  by 
the  passband  modulated  signal. 

(b)  Using  the  same  frequency  band  in  (a),  how  fast  could  you  signal  using  QPSK  with  the  same 
excess  bandwidth? 

(c)  Estimate  the  transmit  power  needed  in  the  QPSK  system,  assuming  the  same  range  and 
reliability  requirements  as  in  the  16QAM  system. 


Minimum  Shift  Keying 

Problem  4.17  (OQPSK  and  MSK)  Linear  modulation  with  a  bandlimited  pulse  can  perform 
poorly  over  nonlinear  passband  channels.  For  example,  the  output  of  a  passband  hardlimiter 
(which  is  a  good  model  for  power  amplifiers  operating  in  a  saturated  regime)  has  constant 
envelope,  but  a  PSK  signal  employing  a  bandlimited  pulse  has  an  envelope  that  passes  through 
zero  during  a  180  degree  phase  transition,  as  shown  in  Figure  4.22.  One  way  to  alleviate  this 
problem  is  to  not  allow  180  degree  phase  transitions.  Offset  QPSK  (OQPSK)  is  one  example  of 
such  a  scheme,  where  the  transmitted  signal  is  given  by 

OO  ^ 

s(t)  =  Y  bc[n]p(t  -  nT)  +  jbs[n]p(t  -  nT  - —)  (4.26) 

n=—oo 

where  {bc[n]},  bs[n]  are  ±1  BPSK  symbols  modulating  the  I  and  Q  channels,  with  the  I  and  Q 
signals  being  staggered  by  half  a  symbol  interval.  This  leads  to  phase  transitions  of  at  most  90 
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Envelope  is  zero  due  to  1 80  degrees  phase  transition 


1 


Figure  4.22:  The  envelope  of  a  PSK  signal  passes  through  zero  during  a  180  degree  phase 
transition,  and  gets  distorted  over  a  nonlinear  channel. 


degrees  at  integer  multiples  of  the  bit  time  =  \.  Minimum  Shift  Keying  (MSK)  is  a  special 
case  of  OQPSK  with  timelimited  modulating  pulse 

jrj- 

p(t)  =  V/2sin(— )/[0,T|(t)  (4.27) 

(a)  Sketch  the  I  and  Q  waveforms  for  a  typical  MSK  signal,  clearly  showing  the  timing  relationship 
between  the  waveforms. 

(b)  Show  that  the  MSK  waveform  has  constant  envelope  (an  extremely  desirable  property  for 
nonlinear  channels) . 

(c)  Find  an  analytical  expression  for  the  PSD  of  an  MSK  signal,  assuming  that  all  bits  sent  are 
i.i.d.,  taking  values  ±1  with  equal  probability.  Plot  the  PSD  versus  normalized  frequency  fT. 

(d)  Find  the  99%  power  containment  normalized  bandwidth  of  MSK.  Compare  with  the  minimum 
Nyquist  bandwidth,  and  the  99%  power  containment  bandwidth  of  OQPSK  using  a  rectangular 
pulse. 

(e)  Recognize  that  Figure  4.6  gives  the  PSD  for  OQPSK  and  MSK,  and  reproduce  this  figure, 
normalizing  the  area  under  the  PSD  curve  to  be  the  same  for  both  modulation  formats. 


Orthogonal  signaling 

Problem  4.18  (FSK  tone  spacing)  Consider  two  real- valued  passband  pulses  of  the  form 

s0(t)  =  cos(27r/0t  +  <f>o)  0  <  t  <  T 
Si(t)  =  cos(27r fit  +  <f>f)  0  <  t  <  T 

where  /i  >  /o  >  1/T.  The  pulses  are  said  to  be  orthogonal  if  (so,si)  =  /QT  so(t)si(t)dt  =  0. 

(a)  If  <%  =  0i  =  0,  show  that  the  minimum  frequency  separation  such  that  the  pulses  are 
orthogonal  is  fi  —  f0  = 

(b)  If  (j) o  and  <f>i  are  arbitrary  phases,  show  that  the  minimum  separation  for  the  pulses  to  be 
orthogonal  regardless  of  0O,  0!  is  /i  —  f0  —  1/T. 

Remark:  The  results  of  this  problem  can  be  used  to  determine  the  bandwidth  requirements  for 
coherent  and  noncoherent  FSK,  respectively. 


Problem  4.19  (Walsh-Hadamard  codes) 

(a)  Specify  the  Walsh-Hadamard  codes  for  8-ary  orthogonal  signaling  with  noncoherent  reception. 

(b)  Plot  the  baseband  waveforms  corresponding  to  sending  these  codes  using  a  square  root  raised 
cosine  pulse  with  excess  bandwidth  of  50%. 

(c)  What  is  the  fractional  increase  in  bandwidth  efficiency  if  we  use  these  8  waveforms  as  building 
blocks  for  biorthogonal  signaling  with  coherent  reception? 
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Figure  4.23:  Baseband  signals  for  Problem  4 


Problem  4.20  The  two  orthogonal  baseband  signals  shown  in  Figure  4.23  are  used  as  building 
blocks  for  constructing  passband  signals  as  follows. 

up(t)  =  a(t )  cos  2nfct  —  b(t )  sin  2nfct 
vp(t )  =  b(t)  cos  2irfct  —  a(t)  sin  2n fct 
wp(t )  =  b{t )  cos  2nfct  +  a(t)  sin  27r/cf 
xp(t )  =  a(f)  cos27r/cf  +  6(f)  sin27r/cf 

where  fc  1. 

(a)  True  or  False  The  signal  set  can  be  used  for  4-ary  orthogonal  modulation  with  coherent 
demodulation. 

(b)  True  or  False  The  signal  set  can  be  used  for  4-ary  orthogonal  modulation  with  noncoherent 
demodulation. 


Bandwidth  occupancy  as  a  function  of  modulation  format 

Problem  4.21  We  wish  to  send  at  a  rate  of  10  Mbits/sec  over  a  passband  channel.  Assum¬ 
ing  that  an  excess  bandwidth  of  50%  is  used,  how  much  bandwidth  is  needed  for  each  of  the 
following  schemes:  QPSK,  64-QAM,  and  64-ary  noncoherent  orthogonal  modulation  using  a 
Walsh-Hadamard  code. 

Problem  4.22  Consider  64-ary  orthogonal  signaling  using  Walsh-Hadamard  codes.  Assuming 
that  the  chip  pulse  is  square  root  raised  cosine  with  excess  bandwidth  25%,  what  is  the  bandwidth 
required  for  sending  data  at  20  Kbps  over  a  passband  channel  assuming  (a)  coherent  reception, 
(b)  noncoherent  reception. 


Software  Lab  4.1:  Linear  modulation  over  a  noiseless  ideal  channel 

Lab  Objectives:  This  is  the  first  of  a  sequence  of  software  labs  which  gradually  develop  a 
reasonably  complete  Matlab  simulator  for  a  linearly  modulated  system.  The  follow-on  labs  are 
Software  Lab  6.1  in  Chapter  6,  and  Software  Lab  8.1  in  Chapter  8. 

Reading:  Sections  4.3  (signaling  over  bandlimited  channels)  and  4.B  (simulations  with  ban- 
dlimited  pulses),  together  with  the  following  background. 

Background 

Figure  4.24  shows  block  diagrams  corresponding  to  a  typical  DSP-centric  realization  of  a  com¬ 
munication  transceiver  employing  linear  modulation.  In  the  labs,  we  model  the  core  components 
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of  such  a  system  using  the  complex  baseband  representation,  as  shown  in  Figure  4.25.  Given  the 
equivalence  of  passband  and  complex  baseband,  we  are  only  skipping  the  modeling  of  finite  pre¬ 
cision  effects  due  to  digital-to-analog  conversion  (DAC)  and  analog-to-digital  conversion  (ADC). 
These  effects  can  easily  be  incorporated  into  Matlab  models  such  as  those  we  develop,  but  are 
beyond  our  current  scope. 
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Figure  4.24:  Typical  DSP-centric  transceiver  realization.  Our  model  does  not  include  the  blocks 
shown  in  dashed  lines.  Finite  precision  effects  such  as  DAC  and  ADC  are  not  considered.  The 
upconversion  and  downconversion  operations  are  not  modeled.  The  passband  channel  is  modeled 
as  an  LTI  system  in  complex  baseband. 
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Figure  4.25:  Block  diagram  of  a  linearly  modulated  system,  modeled  in  complex  baseband. 


A  few  points  worth  noting  about  the  model  of  Figure  4.25: 

Choice  of  transmit  filter:  The  PSD  of  the  transmitted  signal  is  proportional  to  \Gtx(I)\  (see 
Chapter  4).  The  choice  of  transmit  filter  is  made  based  on  spectral  constraints,  as  well  as  con¬ 
siderations  such  as  sensitivity  to  receiver  timing  errors  and  intersymbol  interference.  Typically, 
the  bandwidth  employed  is  of  the  order  of  7k. 

Channel  model:  We  typically  model  the  channel  as  an  linear  time-invariant  (LTI)  system.  For 
certain  applications,  such  as  wireless  communications,  the  channel  may  be  modeled  as  slowly 
time  varying. 
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Noise  model:  Noise  is  introduced  in  a  later  lab  (in  Chapter  6). 

Receive  filter  and  sampler:  The  optimal  choice  of  receive  filter  is  actually  a  filter  matched  to 
the  cascade  of  the  transmit  filter  and  the  channel.  In  this  case,  there  is  no  information  loss 
in  sampling  the  output  of  the  receive  filter  at  the  symbol  rate  j..  Often,  however,  we  use  a 
suboptimal  choice  of  receive  filter  (e.g.,  a  wideband  filter  flat  over  the  signal  band,  or  a  filter 
matched  to  the  transmit  filter).  In  this  case,  it  is  typically  advantageous  to  sample  faster  than 
the  symbol  rate.  In  general,  we  assume  that  the  sampler  operates  at  rate  where  m  is  a  positive 
integer.  The  output  of  the  sampler  is  then  processed,  typically  using  digital  signal  processing 
(DSP),  to  perform  receiver  functions  such  as  synchronization,  equalization  and  demodulation.. 

The  simulation  of  a  linearly  modulated  system  typically  involves  the  following  steps. 

Step  1:  Generating  random  symbols  to  be  sent 

We  restrict  attention  in  this  lab  to  Binary  Phase  Shift  Keying  (BPSK).  That  is,  the  symbols 
{bn}  in  Figure  1  take  values  ±1. 

Step  2:  Implementing  the  transmit,  channel,  and  receive  filters 

Since  the  bandwidth  of  these  filters  is  of  the  order  of  they  can  be  accurately  implemented  in 
DSP  by  using  FIR  filters  operating  on  samples  at  a  rate  which  is  a  suitable  multiple  of  The 
default  choice  of  sampling  rate  in  the  labs  is  unless  specified  otherwise.  If  the  filter  is  specified 
in  continuous  time,  typically,  one  simply  samples  the  impulse  response  at  rate  taking  a  large 
enough  filter  length  to  capture  most  of  the  energy  in  the  impulse  response.  Code  fragment  4.B.1 
in  the  appendix  illustrates  generating  a  discrete  time  filter  corresponding  to  a  truncated  raised 
cosine  pulse. 

Step  3:  Sending  the  symbols  through  the  filters. 

To  send  symbols  at  rate  T  through  filters  implemented  at  rate  it  is  necessary  to  upsample 
the  symbols  before  convolving  them  with  the  filter  impulse  response  determined  in  Step  2.  Code 
fragment  4.B.2  in  the  appendix  illustrates  this  for  a  raised  cosine  pulse. 

Step  4:  Adding  noise 

Typically,  we  add  white  Gaussian  noise  (model  to  be  specified  in  a  later  lab)  at  the  input  to  the 
receive  filter. 

Step  5:  Processing  at  the  receive  filter  output 

If  there  is  no  intersymbol  interference  (ISI),  the  processing  simply  consists  of  sampling  at  rate 
to  get  decision  statistics  for  the  symbols  of  interest.  For  BPSK,  you  might  simply  take  the  sign 
of  the  decision  statistic  to  make  your  bit  decision. 

If  the  ISI  is  significant,  then  channel  equalization  (discussed  in  a  later  lab)  is  required  prior  to 
making  symbol  decisions. 


Laboratory  Assignment 


0)  Write  a  Matlab  function  analogous  to  Code  Fragment  4.B.1  to  generate  an  SRRC  pulse  (i.e. , 
do  Problem  4.13(a))  where  you  can  specify  the  truncation  length  and  the  excess  bandwidth. 

1)  Set  the  transmit  filter  to  an  SRRC  pulse  with  excess  bandwidth  22%,  sampled  at  rate  4/T 
and  truncated  to  [— 5T,  5T],  Plot  the  impulse  response  of  the  transmit  filter  versus  t/T. 

If  you  have  difficulty  generating  the  SRRC  pulse,  use  the  following  code  fragment  to  generate 
the  transmit  filter: 


Code  Fragment  4.7.1  (Explicit  specification  of  transmit  filter) 
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%first  specify  half  of  the  filter 

hhalf  =  [-0 . 025288315 ; -0 . 034167931 ; -0 . 035752323 ; -0 . 016733702 ; 0 . 021602514; 

0 . 064938487 ; 0 . 091002137 ; 0 . 081894974 ; 0 . 037071 157 ; -0 . 021998074 ; -0 . 060716277  ; 
-0 . 051178658 ; 0 . 007874526 ; 0 . 084368728 ; 0 . 126869306 ; 0 . 094528345 ; -0 . 012839661 ; 
-0 . 143477028 ; -0 . 211829088 ; -0 . 140513128 ; 0 . 094601918 ; 0 . 441387140 ; 0 . 785875640 ; 
1.0]  ; 

transmit_f ilter  =  [hhalf jflipud (hhalf )] ; 


2)  Using  the  DFT  (as  in  Code  Fragment  2.5.1  for  Example  2.5.4),  compute  the  magnitude  of 
the  transfer  function  of  the  transmit  filter  versus  the  normalized  frequency  fT  (make  sure  the 
resolution  in  frequency  is  good  enough  to  get  a  smooth  plot,  e.g.,  at  least  as  good  as  g^y).  From 
eyeballing  the  plot,  check  whether  the  normalized  bandwidth  (i.e. ,  bandwidth  as  a  multiple  of 
7p)  is  well  predicted  by  the  nominal  excess  bandwidth. 

3)  Use  the  transmit  filter  in  the  Code  Fragment  4.B.2,  which  implements  upsampling  and  allows 
sending  a  programmable  number  of  symbols  through  the  system.  Set  the  receive  filter  to  be  the 
matched  filter  corresponding  to  the  transmit  filter,  and  plot  the  response  at  the  output  of  the 
receive  filter  to  a  single  symbol.  Is  the  cascade  of  the  transmit  and  receive  filters  is  Nyquist  at 
rate  1/T? 

4)  Generate  100  random  bits  {a,[rz] }  taking  values  in  {0,1},  and  map  them  to  symbols  {5[n]} 
taking  values  in  {—1,  +1},  with  0  mapped  to  +1  and  1  to  —1. 

5)  Send  the  100  symbols  {5[n]}  through  the  system.  What  is  the  length  of  the  corresponding 
output  of  the  transmit  filter?  What  is  the  length  of  the  corresponding  output  of  the  receive 
filter?  Plot  separately  the  input  to  the  receive  filter,  and  the  output  of  the  receive  filter  versus 
time,  with  one  unit  of  time  on  the  x-axis  equal  to  the  symbol  time  T. 

6)  Do  the  best  job  you  can  in  recovering  the  transmitted  bits  { a [rz] }  by  directly  sampling  the 
input  to  the  receive  filter,  and  add  lines  in  the  matlab  code  for  implementing  your  idea.  That 
is,  select  a  set  of  100  samples,  and  estimate  the  100  transmitted  bits  based  on  the  sign  of  these 
samples.  (What  sampling  delay  and  spacing  would  you  use?).  Estimate  the  probability  of  error 
(note:  no  noise  has  been  added). 

7)  Do  the  best  job  you  can  in  recovering  the  transmitted  bits  by  directly  sampling  the  output  of 
the  receive  filter,  and  add  lines  in  the  Matlab  code  for  implementing  your  idea.  That  is,  select  a 
set  of  100  samples,  estimate  the  100  transmitted  bits  based  on  the  sign  of  these  samples.  (What 
sampling  delay  and  spacing  would  you  use?).  Estimate  the  probability  of  error.  Also  estimate 
the  probability  of  error  if  you  chose  an  incorrect  delay,  offset  from  the  correct  delay  by  T  j 2. 

8)  Suppose  that  the  receiver  LO  used  for  downconversion  is  ahead  in  frequency  and  phase  relative 
to  the  incoming  wave  by  A /  =  and  a  phase  of  7t/2.  Modify  your  complex  baseband  model 
to  include  the  effects  of  the  carrier  phase  and  frequency  offset.  When  you  now  sample  at  the 
“correct”  delay  as  determined  in  7),  do  a  scatter  plot  of  the  complex-valued  samples  {y[n\,n  = 
1, ...,  100}  that  you  obtain.  Can  you  make  correct  decisions  based  on  taking  the  sign  of  the  real 
part  of  the  samples,  as  in  7)? 

9)  Now  consider  a  differentially  encoded  system  in  which  we  send  {a[n\,n  =  1, ...,  99},  where 
a[n\  E  {0, 1},  by  sending  the  following  ±1  bits:  6[1]  =  +1,  and  for  n  =  2, ...,  100 

*4  =  {  \  tj1  =  “’ 

1  J  {  —b[n  —  1J,  a[n\  =  1, 

Devise  estimates  for  the  bits  {a[n]}  from  the  samples  {?/[n]}  in  8),  and  estimate  the  probability 
of  error. 

Hint:  What  does  y[n]y*[n  —  1]  look  like? 
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Lab  Report:  Your  lab  report  should  answer  the  preceding  questions  in  order,  and  should 
document  the  reasoning  you  used  and  the  difficulties  you  encountered.  Comment  on  whether 
you  get  better  error  probability  in  6)  or  7),  and  why? 


4.  A  Power  spectral  density  of  a  linearly  modulated  signal 

We  wish  to  compute  the  PSD  of  a  linearly  modulated  signal  of  the  form 

=  £  b[n\p(t  —  nT ) 

n 

While  we  model  the  complex- valued  symbol  sequence  {&[n]}  as  random,  we  do  not  need  to 
invoke  concepts  from  probability  and  random  processes  to  compute  the  PSD,  but  can  simply 
model  time-averaged  quantities  for  the  symbol  sequence.  For  example,  the  DC  value,  which  is 
typically  designed  to  be  zero,  is  defined  by 

_  1  N 

61"1  =  jfc,2lvTT  £  6[nl  (428) 

n=—N 


We  also  define  the  time-averaged  autocorrelation  function  Rb[k]  =  b[n\b*[n  —  k]  for  the  symbol 
sequence  as  the  following  limit: 

1  N 

Rb[k[=  viLffiV  b[n]b*[n-k]  (4.29) 

Note  that  we  are  being  deliberately  sloppy  about  the  limits  of  summation  in  n  on  the  right-hand 
side  to  avoid  messy  notation.  Actually,  since  —  N  <  m  =  n  —  k  <  N,  we  have  the  constraint 
— N  +  k<n<N  +  k  in  addition  to  the  constraint  —  N  <  n  <  N.  Thus,  the  summation  in 
n  should  depend  on  the  delay  k  at  which  we  are  evaluating  the  autocorrelation  function,  going 
from  n  =  —N  to  n  =  N  +  k  for  k  <  0,  and  n  =  —N  +  k  to  n  =  N  for  k  >  0.  However,  we  ignore 
these  edge  effects,  since  become  negligible  when  we  let  N  get  large  while  keeping  k  fixed. 

We  now  compute  the  time-averaged  PSD.  As  described  in  Section  4.2.1,  the  steps  for  computing 
the  PSD  for  a  finite-power  signal  u(t)  are  as  follows: 

(a)  timelimit  to  a  finite  observation  interval  of  length  T0  to  get  a  finite  energy  signal  uj-o(i); 

(b)  compute  the  Fourier  transform  Dr0(/),  and  hence  obtain  the  energy  spectral  density  |Dr0(/)|2; 

(c)  estimate  the  PSD  as  Su(f)  =  and  take  the  limit  T0  — y  oo  to  obtain  Su(f ). 

Consider  the  observation  interval  [— NT,  NT],  which  fits  roughly  2 N  symbols.  In  general,  the 
modulation  pulse  p(t)  need  not  be  timclimited  to  the  symbol  duration  T.  However,  we  can 
neglect  the  edge  effects  caused  by  this,  since  we  eventually  take  the  limit  as  the  observation 
interval  gets  large.  Thus,  we  can  write 


N 

UTo{t)  «  £  b[n\p(t  —  nT) 

n=—N 

Taking  the  Fourier  transform,  we  obtain 

N 

Vt.U )  =  £  b\n}P(f)e-’2',nT 

n=—N 
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The  energy  spectral  density  is  therefore  given  by 

N  N 

\UtM) I2  =  UtM)Ut.(J)  =  Y.  b[n]P(f)e-i2*lnT  Y  6*M P‘(f)el2'lmT 

n=—N  m=—N 

where  we  need  to  use  two  different  dummy  variables,  n  and  m,  for  the  summations  corresponding 
to  UT0(f)  and  [/£(/),  respectively.  Thus, 


N  N 


me 


m=—N  n=—N 


\UtM)\2 = m(/)i2  Y  Y  i,W6’ 

and  the  PSD  is  estimated  as 


I^T0(/)|2  ^ 

s«(f)  =  —ztt —  =  Z.  W 


—j2ir(m—n)fT 


2  NT 


me 


} 


(4.30) 


m=—N  n=—N 


|p/x\|2 

Thus,  the  PSD  factors  into  two  components:  the  first  is  a  term  1  jP1  that  depends  only  on  the 
spectrum  of  the  modulation  pulse  p(t),  while  the  second  term  (in  curly  brackets)  depends  only 
on  the  symbol  sequence  {b[n]}.  Let  us  now  work  on  simplifying  the  latter.  Grouping  terms  of 
the  form  m  =  n  —  k  for  each  fixed  k,  we  can  rewrite  this  term  as 

1  N  N  1  JV 

—  Y  Y  %FM^2'/(”~m)T  =  E^F  E  %]*>'[«- (4.31) 

m=—N  n=—N  k  n=—N 


From  (4.29),  we  see  that  taking  the  limit  N  — >  oo  in  (4.31)  yields  ^ ~2kRb[k\e  i2lTfkT .  Substituting 
into  (4.30),  we  obtain  that  the  PSD  is  given  by 

Su(f)  =  ^  Rb[k]e-^kT  (4.32) 

k 

Thus,  we  see  that  the  PSD  depends  both  on  the  modulating  pulse  p{t)  and  on  the  properties 
of  the  symbol  sequence  {6[n]}.  We  explore  how  the  dependence  on  the  symbol  sequence  can 
be  exploited  for  shaping  the  spectrum  in  the  problems.  However,  for  most  systems,  the  symbol 
sequence  can  be  modeled  as  uncorrelated  and  zero  mean,  In  this  case,  Rb[k]  =  0  for  fc  ^  0. 
Specializing  to  this  important  setting  yields  Theorem  4.2.1. 


4.B  Simulation  resource:  bandlimited  pulses  and  upsam¬ 
pling 

The  discussion  in  this  appendix  should  be  helpful  for  Software  Lab  4.1.  In  order  to  simulate  a 
linearly  modulated  system,  we  must  specify  the  transmit  and  receive  filters,  typically  chosen  so 
that  their  cascade  is  Nyquist  at  the  symbol  rate.  As  mentioned  earlier,  there  are  two  popular 
choices.  One  choice  is  to  set  the  transmit  filter  to  a  Nyquist  pulse,  and  the  receive  filter  to  a 
wideband  pulse  that  has  response  roughly  flat  over  the  band  of  interest.  Another  is  to  set  the 
transmit  and  receive  Liters  to  be  square  roots  (in  the  frequency  domain)  of  a  Nyquist  pulse.  We 
discuss  software  implementations  of  both  choices  here. 
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Consider  the  raised  cosine  pulse,  which  is  the  most  common  choice  for  bandlimited  Nyquist  pulses. 
Setting  the  symbol  rate  1/T  =  1  without  loss  of  generality  (this  is  equivalent  to  expressing  all 
results  in  terms  of  t/T  or  /T),  this  pulse  is  given  by 


1.  0  <  |/|  < 

P(f)  =  <  £  [1  +  CM  (J(|/| -!?))]  .  i?<|/|< 
0  ,  else 

The  corresponding  time  domain  pulse  is  given  by 


1— a 
2 


1  — (— d 

2“ 


(4.33) 


/  \  •  /  \  COS  7T at  ,  v 

pit)  =  sine [t)  1  _  AaH2  (4.34) 

where  0  <  a‘l  denotes  the  excess  bandwidth.  When  generating  a  sampled  version  of  this  pulse, 
we  must  account  for  the  zero  in  the  denominator  at  t  —  ±A.  An  example  Matlab  function  for 
generating  a  sampled  version  of  the  raised  cosine  pulse  is  provided  below.  Note  that  the  code 
must  account  for  the  zero  in  the  denominator  at  t  =  ±A.  It  is  left  as  an  exercise  to  show,  using 
L’Hospital’s  rule,  that  the  0/0  form  taken  by  these  times  evaluates  to  |. 

Code  Fragment  4.B.1  (Sampled  raised  cosine  pulse) 

7„tirae  domain  pulse  for  raised  cosine,  together  with  time  vector  to  plot  it  against 
/(oversampling  factor=  how  much  faster  than  the  symbol  rate  we  sample  at 
%length=where  to  truncate  response  (multiple  of  symbol  time)  on  each  side  of  peak 
°/„a  =  excess  bandwidth 

function  [rc ,time_axis]  =  raised_cosine (a, m, length) 

length_os  =  floor (length*m) ;  /(number  of  samples  on  each  side  of  peak 
°/„time  vector  (in  units  of  symbol  interval)  on  one  side  of  the  peak 
z  =  cumsum(ones(length_os, l))/m; 

A=  sin(pi*z) . /(pi*z) ;  /(term  1 
B=  cos(pi*a*z);  %term  2 
C=  1  -  (2*a*z).~2;  %term  3 

zerotest  =  m/ (2*a) ;  ’/(location  of  zero  in  denominator 
°/0check  whether  any  sample  coincides  with  zero  location 
if  (zerotest  ==  floor (zerotest) ) , 

B(zerotest)  =  pi*a; 

C(zerotest)  =  4*a; 

/(alternative  is  to  perturb  around  the  sample 
%(find  L’ Hospital  limit  numerically) 

°/(B  (zerotest)  =  cos(pi*a*(z(zerotest)+0.001)) ; 

/(C(zerotest)  =  l-(2*a*(z(zerotest)+0.001))~2; 
end 

D  =  (A.*B)./C;  /(response  to  one  side  of  peak 
rc  =  [f lipud(D) ; 1 ;D] ;  %add  in  peak  and  other  side 
time_axis  =  [f lipud(-z) ; 0 ; z] ; 

This  can,  for  example,  be  used  to  generate  a  plot  of  the  raised  cosine  pulse,  as  follows,  where  we 
would  typically  oversample  by  a  large  factor  (e.g.,  m  =  32)  in  order  to  get  a  smooth  plot. 

%/(plot  time  domain  raised  cosine  pulse 
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a  =  0.5;  "/.  desired  excess  bandwidth 

m  =  32;  °/„oversample  by  a  lot  to  get  smooth  plot 

length  =  10;  "/„  where  to  truncate  the  time  domain  response 

% (one-sided,  multiple  of  symbol  time) 

[rc,time]  =  raised_cosine (a, m, length) ; 
plot (time, rc) ; 

The  code  for  the  raised  cosine  function  can  also  be  used  to  generate  the  coefficients  of  a  dis¬ 
crete  time  transmit  filter.  Here,  the  oversampling  factor  would  be  dictated  by  our  DSP-centric 
implementation,  and  would  usually  be  far  less  than  what  is  required  for  a  smooth  plot:  the 
digital-to-analog  converter  would  perform  the  interpolation  required  to  provide  a  smooth  analog 
waveform  for  upconversion.  A  typical  choice  is  m  =  4,  as  in  the  Matlab  code  below  for  generating 
a  noiseless  BPSK  modulated  signal. 

Upsampling:  As  noted  in  our  preview  of  digital  modulation  in  Section  2.3.2,  the  symbols  come 
in  every  T  seconds,  while  the  samples  of  the  transmit  filter  are  spaced  by  T/m.  For  example, 
the  nth  symbol  contributes  b[n\p(t  —  nT )  to  the  transmit  filter  output,  and  the  (n  +  1  )st  symbol 
contributes  b[n  +  1  ]p(t  —  (n  +  1)T).  Since  p{t  —  nT)  and  p(t  —  (n  +  1)T)  are  offset  by  T,  they 
must  be  offset  by  m  samples  when  sampling  at  a  rate  of  m/T.  Thus,  if  the  symbols  are  input  to 
a  transmit  filter  whose  discrete  time  impulse  response  is  expressed  at  sampling  rate  m/T,  then 
successive  symbols  at  the  input  to  the  filter  must  be  spaced  by  m  samples.  That  is,  in  order  to 
get  the  output  as  a  convolution  of  the  symbols  with  the  transmit  filter  expressed  at  rate  m/T, 
we  must  insert  m  —  1  zeros  between  successive  symbols  to  convert  them  to  a  sampling  rate  of 
m/T. 

For  completeness,  we  reproduce  part  of  the  upsampling  Code  Fragment  2.3.2  below  in  imple¬ 
menting  a  raised  cosine  transmit  filter. 

Code  Fragment  4.B.2  (Sampled  transmitter  output) 

oversampling_f actor  =  4; 
m  =  oversampling_f actor ; 

"/(parameters  for  sampled  raised  cosine  pulse 
a  =  0.5; 

length  =  10;%  (truncated  outside  [-length*T, length*T] ) 

"/.raised  cosine  transmit  filter  (time  vector  set  to  a  dummy  variable  which  is  not  used) 
[transmit_f ilter , dummy]  =  raised_cosine(a,m, length) ; 

"/.NUMBER  OF  SYMBOLS 
nsymbols  =  100; 

"/.BPSK  SYMBOL  GENERATION 

symbols  =  sign(rand(nsymbols , 1)  -.5); 

"/.UPSAMPLE  BY  m 

nsymbols_upsampled  =  l+(nsymbols-l)*m;°/0length  of  upsampled  symbol  sequence 
symbols_upsampled  =  zeros (nsymbols_upsampled,  1)  ; "/.initialize 

symbols_upsampled(l  :m:nsymbols_upsampled)=symbols; "/.insert  symbols  with  spacing  m 
"/.NOISELESS  MODULATED  SIGNAL 

tx_output  =  conv(symbols_upsampled,transmit_f ilter) ; 

Let  us  now  discuss  the  implementation  of  an  alternative  transmit  filter,  the  square  root  raised 
cosine  (SRRC).  The  frequency  domain  SRRC  pulse  is  given  by  G(f)  =  sj P{f),  where  P(f)  is 
as  in  (4.33).  We  now  need  to  find  a  sampled  version  of  the  time  domain  pulse  g(t)  in  order  to 
implement  linear  modulation  as  above.  While  this  could  be  done  numerically  by  sampling  the 
frequency  domain  pulse  and  computing  an  inverse  DFT,  we  can  also  find  an  analytical  formula 
for  g(t),  as  follows.  Given  the  practical  importance  of  the  SRRC  pulse,  we  provide  the  formula 
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and  sketch  its  derivation.  Noting  that  1  +  cos  29  =  2  cos2  9,  we  can  rewrite  the  frequency  domain 
expression  (4.33)  for  the  raised  cosine  pulse  as 


P(f) 


1 , 

0<l/l<^ 

cos2(^(l/|-¥))  > 

(4.35) 

0, 

else 

We  can  now  take  the  square  root  to  get  an  analytical  expression  for  the  SRRC  pulse  in 
frequency  domain  as  follows: 

'  1, 

0<l/l<^ 

G(f)  =  < 

cos  (77(1/1  -  w2)) 

dl 

+ 

VI 

^7 

VI 

Frequency  domain  SRRC  pulse 

s  0^ 

else 

(4.36) 

Finding  the  time  domain  SRRC  pulse  is  now  a  matter  of  computing  the  inverse  Fourier  transform. 
Since  it  is  also  an  interesting  exercise  in  utilizing  Fourier  transform  properties,  we  sketch  the 
derivation.  First,  we  break  up  the  frequency  domain  pulse  into  segments  whose  inverse  Fourier 
transforms  are  well  known.  Setting  b  =  -R^,  we  have 


where 

and 

with 


Gi(f)  =  I[-b,b](f)  **  9i(t)  =  2 b  sinc(2 bt)  = 


G(f)  =  G1(f)  +  G2(f) 

sin(27 rbt)  sin7r(l  —  a)t 


Tit 


7 it 


G2(f)  =  u(f  -b)  +  u(rf  -  b) 

l 


U(f)  =  COS  (^/)  /M(/)  =  -  {e^f  +  /[„,„,(/) 

To  evaluate  g2(t),  note  first  that 

I[0,a](f)  R,=  a  sinc (at)  e]nat 


(4.37) 

(4.38) 

(4.39) 

(4.40) 

(4.41) 


Multiplication  by  e^af  in  the  frequency  domain  corresponds  to  leftward  time  shift  by  R,  while 
multiplication  by  e--7^  corresponds  to  a  rightward  time  shift  by  R.  From  (4.40)  and  (4.41),  we 
therefore  obtain  that 


U(f)  =  cos  (^/)  I[o ,a](/)  **  u(t) 


a  . 
—sine 
2 


a  . 
—sine 
2 


ejira(t-±) 


Simplifying,  we  obtain  that 


Now, 


,  .  a  2ej2'Kat  —  j8at 
“W  =  *  1  -  16aV 


(4.42) 


G2(f)  =  U(f  -b)  +  U(-f  -&)<->•  g2(t)  =  u{t)ej2M  +  u*(t)e~j2nbt  =  Re  (2 u{t)ej2nbt)  (4.43) 
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Plugging  in  (4.42),  and  substituting  the  value  of  b  =  we  obtain  upon  simplification  that 


2 „(tW“  =  2“  2eM1+a)‘  ~  jSatei'M* 


7 r 


1  -  16a2f2 


Taking  the  real  part,  we  obtain 


92  (t)  = 


1  4acos(7r(l  +  a)t )  +  16a2f  sin(7r(l  —  a)t) 


7T 


1  -  16a2f2 


(4.44) 


Combining  (4.38)  and  (4.44)  and  simplifying,  we  obtain  the  following  expression  for  the  SRRC 
pulse  g(t)  =  gi(t)  +  g2(t): 


9(t ) 


4acos(7r(l  +  a)t)  +  sin(71R-a)t) 
7r(l  —  16a2t2) 


Time  domain  SRRC  pulse 


(4.45) 


We  leave  it  as  an  exercise  to  write  Matlab  code  to  generate  a  sampled  version  of  the  SRRC  pulse 
(analogous  to  Code  Fragment  4.B.1),  taking  into  account  the  zeros  in  the  denominator.  This 
can  then  be  used  to  generate  a  noiseless  transmit  waveform  as  in  Code  Fragment  4.B.2  simply 
by  replacing  the  transmit  filter  by  an  SRRC  pulse. 
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Chapter  5 

Probability  and  Random  Processes 


Probability  theory  is  fundamental  to  communication  system  design,  especially  for  digital  commu¬ 
nication.  Not  only  are  there  uncontrolled  sources  of  uncertainty  such  as  noise,  interference,  and 
other  channel  impairments  that  are  only  amenable  to  statistical  modeling,  but  the  very  notion  of 
information  underlying  digital  communication  is  based  on  uncertainty.  In  particular,  the  receiver 
in  a  communication  system  does  not  know  a  priori  what  the  transmitter  is  sending  (otherwise 
the  transmission  would  be  pointless),  hence  the  receiver  designer  must  employ  statistical  models 
for  the  transmitted  signal.  In  this  chapter,  we  review  basic  concepts  of  probability  and  random 
variables  with  examples  motivated  by  communications  applications.  We  also  introduce  the  con¬ 
cept  of  random  processes,  which  are  used  to  model  both  signals  and  noise  in  communication 
systems. 

Chapter  Plan:  The  goal  of  this  chapter  is  to  develop  the  statistical  modeling  tools  required  in 
later  chapters.  For  readers  who  are  already  comfortable  with  probability  and  random  processes, 
the  shortest  path  to  Chapter  6  is  to  review  the  material  on  Gaussian  random  variables  in  Section 
5.6  and  noise  modeling  in  Section  5.8.  Sections  5.1  through  5.5  provide  a  review  of  background 
material  on  probability  and  random  variables.  Section  5.1  discusses  basic  concepts  of  probability: 
the  most  important  of  these  for  our  purpose  are  the  concepts  of  conditional  probability  and  Bayes’ 
rule.  Sections  5.2  and  5.4  discuss  random  variables  and  functions  of  random  variables.  Multiple 
random  variables,  or  random  vectors,  are  discussed  in  Section  5.3.  Section  5.5  discusses  various 
statistical  averages  and  their  computation.  Material  which  is  not  part  of  the  assumed  background 
starts  with  Section  5.6;  this  section  goes  in  depth  into  Gaussian  random  variables  and  vectors, 
which  play  a  critical  role  in  the  mathematical  modeling  of  communication  systems.  Section  5.7 
introduces  random  processes  in  sufficient  depth  that  we  can  describe,  and  perform  elementary 
computations  with,  the  classical  white  Gaussian  noise  (WGN)  model  in  Section  5.8.  At  this 
point,  zealous  followers  of  a  “just  in  time”  philosophy  can  move  on  to  the  discussion  of  optimal 
receiver  design  in  Chapter  6.  However,  many  others  might  wish  to  go  through  one  more  section 
Section  5.9,  which  provides  a  more  general  treatment  of  the  effect  of  linear  operations  on  random 
processes.  The  results  in  this  section  allow  us,  for  example,  to  model  noise  correlations  and  to 
compute  quantities  such  as  signal-to-noise  ratio  (SNR).  Material  which  we  do  not  build  on  in 
later  chapters,  but  which  may  be  of  interest  to  some  readers,  is  placed  in  the  appendices:  this 
includes  limit  theorems,  qualitative  discussion  of  noise  mechanisms,  discussion  of  the  structure 
of  passband  random  processes,  and  quantification,  via  SNR  computations,  of  the  effect  of  noise 
on  analog  modulation. 


5.1  Probability  Basics 

In  this  section,  we  remind  ourselves  of  some  important  definitions  and  properties. 
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Sample  Space:  The  starting  point  in  probability  is  the  notion  of  an  experiment  whose  outcome 
is  not  deterministic.  The  set  of  all  possible  outcomes  from  the  experiment  is  termed  the  sample 
space  hi.  For  example,  the  sample  space  corresponding  to  the  throwing  of  a  six-sided  die  is 
=  {1,  2,  3, 4,  5,  6}.  An  analogous  example  which  is  well-suited  to  our  purpose  is  the  sequence 
of  bits  sent  by  the  transmitter  in  a  digital  communication  system,  modeled  probabilistically  by 
the  receiver.  For  example,  suppose  that  the  transmitter  can  send  a  sequence  of  seven  bits,  each 
taking  the  value  0  or  1.  Then  our  sample  space  consists  of  the  2'  =  128  possible  bit  sequences. 

Event:  Events  are  sets  of  possible  outcomes  to  which  we  can  assign  a  probability.  That  is,  an 
event  is  a  subset  of  the  sample  space.  For  example,  for  a  six-sided  die,  the  event  {1,3,5}  is  the 
set  of  odd-numbered  outcomes. 


Figure  5.1:  Basic  set  operations. 


We  are  often  interested  in  probabilities  of  events  obtained  from  other  events  by  basic  set  opera¬ 
tions  such  as  complementation,  unions  and  intersections;  see  Figure  5.1. 

Complement  of  an  Event  (“NOT”):  For  an  event  A,  the  complement  (“not  A”),  denoted 
by  Ac,  is  the  set  of  outcomes  that  do  not  belong  to  A. 

Union  of  Events  (“OR”):  The  union  of  two  events  A  and  B ,  denoted  by  A  U  B,  is  the  set 
of  all  outcomes  that  belong  to  either  A  or  B.  The  term  ”or”  always  refers  to  the  inclusive  or , 
unless  we  specify  otherwise.  Thus,  outcomes  belonging  to  both  events  are  included  in  the  union. 

Intersection  of  Events  (“AND”):  The  intersection  of  two  events  A  and  B,  denoted  by  An B, 
is  the  set  of  all  outcomes  that  belong  to  both  A  and  B. 

Mutually  Exclusive,  or  Disjoint,  Events:  Events  A  and  B  are  mutually  exclusive,  or  disjoint, 
if  their  intersection  is  empty:  A  H  B  =  0. 

Difference  of  Events:  The  difference  A  \  B  is  the  set  of  all  outcomes  that  belong  to  A  but  not 
to  B.  In  other  words,  A  \  B  =  A  D  Bc. 

Probability  Measure:  A  probability  measure  is  a  function  that  assigns  probability  to  events. 
Some  properties  are  as  follows. 

Range  of  probability:  For  any  event  A,  we  have  0  <  P[A]  <  1.  The  probability  of  the  empty 
set  is  zero:  P[0]  =  0.  The  probabilty  of  the  entire  sample  space  is  one:  P[f2]  =  1. 

Probabilities  of  disjoint  events  add  up:  If  two  events  A  and  B  are  mutually  exclusive,  then 
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the  probability  of  their  union  equals  the  sum  of  their  probabilities. 

P[A  U  B}=  P[A]  +  P[B\  if  A  n  B  =  0 


(5.1) 


By  mathematical  induction,  we  can  infer  that  the  probability  of  the  union  of  a  finite  number  of 
pairwise  disjoint  events  also  adds  up.  It  is  useful  to  review  the  principle  of  mathematical  induction 
via  this  example.  Specifically,  suppose  that  we  are  given  pairwise  disjoint  events  A1,  A2,  A3, .... 
We  wish  to  prove  that,  for  any  n  >  2, 


P[A\  U  A2  U  ...  U  An\  =  P[Ai]  +  ...  +  P[An]  if  Ai  fl  Aj  =  0  for  all  i  ^  j  (5.2) 

Mathematical  induction  consists  of  the  following  steps: 

(a)  verify  that  the  result  is  true  for  the  initial  value  of  n,  which  in  our  case  is  n  =  2; 

(b)  assume  that  the  result  is  true  for  an  arbitrary  value  of  n  =  k\ 

(c)  use  (a)  and  (b)  to  prove  that  the  result  is  true  for  n  =  k  +  1. 

In  our  case,  step  (a)  does  not  require  any  work;  it  holds  by  virtue  of  our  assumption  of  (5.1). 
Now,  assume  that  (5.2)  holds  for  n  =  k.  Now, 


kb  U  A2  U  ...  UAkU  Ak+1  =  B  U  Ak+1 


where 


B  =  Ai  U  A2  U  ...  U  Ak 


and  Ak+\  are  disjoint.  We  can  therefore  conclude,  using  step  (a),  that 


P[BuAk+1]=P[B]  +  P[Ak+1] 


But  using  step  (b),  we  know  that 


P[B]  =  P[AX  U  A2  U  ...  U  Ak\  =  P[Ai]  +  ...  +  P[Ak] 


We  can  now  conclude  that 


P[Ai  U  A2  U  ...  U  Ak  U  Ak+i]  —  P[Ai]  +  ...  +  P[Ak+i] 


thus  accomplishing  step  (c). 

The  preceding  properties  are  typically  stated  as  axioms,  which  provide  the  starting  point  from 
which  other  properties,  some  of  which  are  stated  below,  can  be  derived. 

Probability  of  the  complement  of  an  event:  The  probabilities  of  an  event  and  its  comple¬ 
ment  sum  to  one.  By  definition,  A  and  Ac  are  disjoint,  and  AuAc  =  fl.  Since  P[f2]  =  1,  we  can 
now  apply  (5.1)  to  infer  that 

P[A]  +  P[AC }  =  1  (5.3) 

Probabilities  of  unions  and  intersections:  We  can  use  the  property  (5.1)  to  infer  the  fol¬ 
lowing  property  regarding  the  union  and  intersection  of  arbitrary  events: 

P[Al  U  A2)  =  p[kb]  +  P[A2\  -  P[A±  n  A2]  (5.4) 


Let  us  get  a  feel  for  how  to  use  the  probability  axioms  by  proving  this.  We  break  A\  U  A2  into 
disjoint  events  as  follows: 

A\  U  A2  =  A2  U  (Ai  \  A2 ) 

Applying  (5.1),  we  have 

P[A,  U  A2]  =  P[A2 }  +  P[AX  \  A2]  (5.5) 
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Furthermore,  since  A\  can  be  written  as  the  disjoint  union  A\  =  ( A\  D  A2)  U  {A\  \  A2),  we  have 
P[Ai\  =  P[Ai  n  A2\  +  P[Ai  \  A2 ],  or  P[Ai  \  A2]  =  P[Ai]  -  P[Ai  n  A2\.  Plugging  into  (5.5),  we 
obtain  (5.4). 


Conditional  probability:  The  conditional  probability  of  A  given  B  is  the  probability  of  A 
assuming  that  we  already  know  that  the  outcome  of  the  experiment  is  in  B.  Outcomes  corre¬ 
sponding  to  this  probability  must  therefore  belong  to  the  intersection  AnB.  We  therefore  define 
the  conditional  probability  as 


P[A\B) 


i5]/!  n  B] 

m 


(5.6) 


(We  assume  that  P[B]  >  0,  otherwise  the  condition  we  are  assuming  cannot  occur.) 

Conditional  probabilities  behave  just  the  same  as  regular  probabilities,  since  all  we  are  doing  is 
restricting  the  sample  space  to  the  event  being  conditioned  on.  Thus,  we  still  have  P[A|P]  = 
1  —  P[AC\B]  and 

P[AX  U  A2\B\  =  P[Ai\B]  +  P[A2\B }  -  P[Ai  n  A2\B\ 


Conditioning  is  a  crucial  concept  in  models  for  digital  communication  systems.  A  typical  appli¬ 
cation  is  to  condition  on  the  which  of  a  number  of  possible  transmitted  signals  is  sent,  in  order 
to  describe  the  statistical  behavior  of  the  communication  medium.  Such  statistical  models  then 
form  the  basis  for  receiver  design  and  performance  analysis. 


Example  5.1.1  (a  binary  channel): 


Transmitted  ,  Received 

1-a 


Figure  5.2:  Conditional  probabilities  modeling  a  binary  channel. 


Figure  5.2  depicts  the  conditional  probabilities  for  a  noisy  binary  channel.  On  the  left  side  are 
the  two  possible  values  of  the  bit  sent,  and  on  the  right  are  the  two  possible  values  of  the  bit 
received.  The  labels  on  a  given  arrow  are  the  conditional  probability  of  the  received  bit,  given 
the  transmitted  bit.  Thus,  the  binary  channel  is  defined  by  means  of  the  following  conditional 
probabilities: 


P[0  received 1 0  transmitted]  =  1  —  a,  P[  1  received |0  transmitted]  =  a; 

P[0  received  1 1  transmitted]  =  b,P[  1  received  |1  transmitted]  =  1  —  6 

These  conditional  probabilities  are  often  termed  the  channel  transition  probabilities.  The  proba¬ 
bilities  a  and  b  are  called  the  crossover  probabilities.  When  a  =  b,  we  obtain  the  binary  symmetric 
channel. 

Law  of  total  probability:  For  events  A  and  P,  we  have 

P[A)  =  P  [An  B]  +  P  [An  Bc]  =  P[A\B]P[B]  +  P[A|PC]P[PC]  (5.7) 
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In  the  above,  we  have  decomposed  an  event  of  interest,  A,  into  a  disjoint  union  of  two  events, 
A(lB  and  A(lBc,  so  that  (5.1)  applies.  The  sets  B  and  Bc  form  a  partition  of  the  entire  sample 
space;  that  is,  they  are  disjoint,  and  their  union  equals  Q.  This  generalizes  to  any  partition  of 
the  sample  space;  that  is,  if  Pi,  P2, ...  are  mutually  exclusive  events  such  that  their  union  covers 
the  sample  space  (actually,  it  is  enough  if  the  union  contains  A),  then 

P[A]  =  P  [A  n  Bi]  =  P[A|£?,]P[A]  (5.8) 

i  i 


Example  5.1.2  (Applying  the  law  of  total  probability  to  the  binary  channel):  For  the 

channel  in  Figure  5.2,  set  a  =  0.1  and  b  =  0.25,  and  suppose  that  the  probability  of  transmitting 
0  is  0.6.  This  is  called  the  prior ,  or  a  priori,  probability  of  transmitting  0,  because  it  is  the 
statistical  information  that  the  receiver  has  before  it  sees  the  received  bit.  Using  (5.3),  the  prior 
probability  of  1  being  transmitted  is 

P[0  transmitted]  =  0.6  =  1  —  P[1  transmitted] 

(since  sending  0  or  1  are  our  only  options  for  this  particular  channel  model,  the  two  events  are 
complements  of  each  other).  We  can  now  compute  the  probability  that  0  is  received  using  the 
law  of  total  probability,  as  follows. 

P[0  received] 

=  P[0  received  1 0  transmitted]  P[0  transmitted]  +  P[0  received  |1  transmitted]  P[1  transmitted] 
=  0.9  x  0.6  +  0.25  x  0.4  =  0.64 


We  can  also  compute  the  probability  that  1  is  received  using  the  same  technique,  but  it  is  easier 
to  infer  this  from  (5.3)  as  follows: 

P[1  received]  =  1  —  P[0  received]  =  0.36 


Bayes’  rule:  Given  P[A|P],  we  compute  P[P|A]  as  follows: 

pro,  n  _  P[A\B]P[B]  P[A\B]P[B] 

P[A]  P[A\B]P[B]  +  P[A\BC}P[BC] 


(5.9) 


where  we  have  used  (5.7).  Similarly,  in  the  setting  of  (5.8),  we  can  compute  P[Bj\A]  as  follows: 


P\B,\A] 


pim^pm 

P[A) 


p\a\bs\p\bs] 

^PlAlBMBi 


(5.10) 


Bayes’  rule  is  typically  used  as  follows  in  digital  communication.  The  event  B  might  correspond 
to  which  transmitted  signal  was  sent.  The  event  A  may  describe  the  received  signal,  so  that 
P[A\B\  can  be  computed  based  on  our  model  for  the  statistics  of  the  received  signal,  given 
the  transmitted  signal.  Bayes’  rule  can  then  be  used  to  compute  the  conditional  probability 
P[B\A]  of  a  given  signal  having  been  transmitted,  given  information  about  the  received  signal, 
as  illustrated  in  the  example  below. 


Example  5.1.3  (Applying  Bayes’  rule  to  the  binary  channel):  Continuing  with  the  binary 
channel  of  Figure  5.2  with  a  =  0.1,  b  =  0.25,  let  us  find  the  probability  that  0  was  transmitted, 
given  that  0  is  received.  This  is  called  the  posterior,  or  a  posteriori,  probability  of  0  being 
transmitted,  because  it  is  the  statistical  model  that  the  receiver  infers  after  it  sees  the  received 
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bit.  As  in  Example  5.1.2,  we  assume  that  the  prior  probability  of  0  being  transmitted  is  0.6.  We 
now  apply  Bayes’  rule  as  follows: 

,  ...  1 1  /.  j  i  P[0  receivedlO  transmitted!  P[0  transmitted! 

P[0  transmitted  1 0  received]  =  — 1 - 1 — P[0  reCeived]  - 1 

_  0.9x0. 6  _  27 
0.64  32 

where  we  have  used  the  computation  from  Example  5.1.2,  based  on  the  law  of  total  probability, 
for  the  denominator.  We  can  also  compute  the  posterior  probability  of  the  complementary  event 
as  follows: 

5 

P[  1  transmitted 1 0  received]  =  1  —  P[0  transmitted |0  received]  =  - — 

32 

These  results  make  sense.  Since  the  binary  channel  in  Figure  5.2  has  a  small  probability  of  error, 
it  is  much  more  likely  that  0  was  transmitted  than  that  1  was  transmitted  when  we  receive  0.  The 
situation  would  be  reversed  if  1  were  received.  The  computation  of  the  corresponding  posterior 
probabilities  is  left  as  an  exercise.  Note  that,  for  this  example,  the  numerical  values  for  the 
posterior  probabilities  may  be  different  when  we  condition  on  1  being  received,  since  the  channel 
transition  probabilities  and  prior  probabilities  are  not  symmetric  with  respect  to  exchanging  the 
roles  of  0  and  1. 

Two  other  concepts  that  we  use  routinely  are  independence  and  conditional  independence. 
Independence:  Events  Ai  and  A2  are  independent  if 

P[Ai  fl  A2]  =  P[Ai\P[A2\  (5.11) 

Example  5.1.4  (independent  bits):  Suppose  we  transmit  three  bits.  Each  time,  the  proba¬ 
bility  of  sending  0  is  0.6.  Assuming  that  the  bits  to  be  sent  are  selected  independently  each  of 
these  three  times,  we  can  compute  the  probability  of  sending  any  given  three-bit  sequence  using 
(5.11). 

P[000  transmitted]  =  P [first  bit  =  0,  second  bit  =  0,  third  bit  =  0] 

=  P [first  bit  =  0]P[second  bit  =  0]P[third  bit  =  0]  =  0.63  =  0.216 

Let  us  do  a  few  other  computations  similarly,  where  we  now  use  the  shorthand  P[x±x2x3]  to 
denote  that  X\X2x3  is  the  sequence  of  three  bits  transmitted. 

P[101]  =  0.4  x  0.6  x  0.4  =  0.096 

and 

P[two  ones  transmitted]  =  P[110]  +  P[101]  +  P[011]  =  3  x  (0.4)2  x  0.6  =  0.288 
The  number  of  ones  is  actually  a  binomial  random  variable  (reviewed  in  Section  5.2). 

Conditional  Independence:  Events  Ai  and  A2  are  conditionally  independent  given  B  if 

P[A  n  A2\B\  =  P[A|P]P[A2|P]  (5.12) 

Example  5.1.5  (independent  channel  uses):  Now,  suppose  that  we  transmit  three  bits, 
with  each  bit  seeing  the  binary  channel  depicted  in  Figure  5.2.  We  say  that  the  channel  is  mem¬ 
oryless  when  the  value  of  the  received  bit  corresponding  to  a  given  channel  use  is  conditionally 
independent  of  the  other  received  bits,  given  the  transmitted  bits.  For  the  setting  of  Example 
5.1.4,  where  we  choose  the  transmitted  bits  independently,  the  following  example  illustrates  the 
computation  of  conditional  probabilities  for  the  received  bits. 

P[100  received  1 010  transmitted] 

=  P[1  received[0  transmitted] P[0  received)  1  transmitted] P[0  received[0  transmitted] 

=  0.1  x  0.25  x  0.9  =  0.0225 
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We  end  this  section  with  a  mention  of  two  useful  bounding  techniques. 

Union  bound:  The  probability  of  a  union  of  events  is  upper  bounded  by  the  sum  of  the 
probabilities  of  the  events. 

P[AX  U  A2]  <  P[Ax]  +  P[A2\  (5.13) 

This  follows  from  (5.4)  by  noting  that  P[Ai  D  A2]  >  0.  This  property  generalizes  to  a  union  of 
a  collection  of  events  by  mathematical  induction: 


P 


IP- 


<  E  pi-4-i 

1=1 


(5.14) 


If  A  implies  B,  then  P[A]  <  P[B\:  An  event  A  implies  an  event  B  (denoted  by  A  — >  B)  if 
and  only  if  A  is  contained  in  B  (i.e. ,  A  C  B).  In  this  case,  we  can  write  B  as  a  disjoint  union  as 
follows:  B  =  A  U  (P  \  A).  This  means  that  P[B]  =  P[A]  +  P[B  \  A]  >  P[A],  since  P[B\A]  >0. 


5.2  Random  Variables 


Figure  5.3:  A  random  variable  is  a  mapping  from  the  sample  space  to  the  real  line. 


A  random  variable  assigns  a  number  to  each  outcome  of  a  random  experiment.  That  is,  a 
random  variable  is  a  mapping  from  the  sample  space  to  the  set  of  real  numbers,  as  shown  in 
Figure  5.3.  The  underlying  experiment  that  leads  to  the  outcomes  in  the  sample  space  can  be 
quite  complicated  (e.g.,  generation  of  a  noise  sample  in  a  communication  system  may  involve 
the  random  movement  of  a  large  number  of  charge  carriers,  as  well  as  the  filtering  operation 
performed  by  the  receiver).  However,  we  do  not  need  to  account  for  these  underlying  physical 
phenomena  in  order  to  specify  the  probabilistic  description  of  the  random  variable.  All  we  need 
to  do  is  to  describe  how  to  compute  the  probabilities  of  the  random  variable  taking  on  a  particular 
set  of  values.  In  other  words,  we  need  to  specify  its  probability  distribution,  or  probability  law. 
Consider,  for  example,  the  Bernoulli  random  variable,  which  may  be  used  to  model  random  bits 
sent  by  a  transmitter,  or  to  indicate  errors  in  these  bits  at  the  receiver. 

Bernoulli  random  variable:  X  is  a  Bernoulli  random  variable  if  it  takes  values  0  or  1.  The 
probability  distribution  is  specified  if  we  know  P[X  =  0]  and  P[X  =  1],  Since  X  can  take  only 
one  of  these  two  values,  the  events  {X  =  0}  and  {X  =  1}  constitute  a  partition  of  the  sample 
space,  so  that  P[X  =  0]+P[X  =  1]  =  1.  We  therefore  can  characterize  the  Bernoulli  distribution 
by  a  parameter  [0, 1],  where  p  =  P[X  =  1]  =  1  —  P[X  =  0].  We  denote  this  distribution  as 
Bernoulli(p). 

In  general,  if  a  random  variable  takes  only  a  discrete  set  of  values,  then  its  distribution  can  be 
specified  simply  by  specifying  the  probabilities  that  it  takes  each  of  these  values. 

Discrete  Random  Variable,  Probability  Mass  Function:  X  is  a  discrete  random  variable 
if  it  takes  a  finite,  or  countably  infinite,  number  of  values.  If  X  takes  values  x±,x2, then  its 
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probability  distribution  is  characterized  by  its  probability  mass  function  ( PMF '),  or  the  probabil¬ 
ities  pi  =  P[ X  =  Xi],  i  —  1,2, ....  These  probabilities  must  add  up  to  one,  YhiPi  =  1>  since  the 
events  {X  =  Xi},  i  =  1,2, ...  provide  a  partition  of  the  sample  space. 

For  random  variables  that  take  values  in  a  continuum,  the  probability  of  taking  any  particular 
value  is  zero.  Rather,  we  seek  to  specify  the  probability  that  the  value  taken  by  the  random 
variable  falls  in  a  given  set  of  interest.  By  choosing  these  sets  to  be  intervals  whose  size  shrinks 
to  zero,  we  arrive  at  the  notion  of  probability  density  function,  as  follows. 

Continuous  Random  Variable,  Probability  Density  Function:  X  is  a  continuous  random 
variable  if  the  probability  P[X  =  x]  is  zero  for  each  x.  In  this  case,  we  define  the  probability 
density  function  (PDF)  as  follows: 


Px(x) 


,  P[x  <  X  <  x  +  Ax] 

hm  - - - 

Ax— >0  Ax 


In  other  words,  for  small  intervals,  we  have  the  approximate  relationship: 


(5.15) 


P[x  <  X  <  x  +  Ax]  ~  px(x)  Ax 

Expressing  an  event  of  interest  as  a  disjoint  union  of  such  small  intervals,  the  probability  of  the 
event  is  the  sum  of  the  probabilities  of  these  intervals;  as  we  let  the  length  of  the  intervals  shrink, 
the  sum  becomes  an  integral  (with  Ax  replaced  by  dx).  Thus,  the  probability  of  X  taking  values 
in  a  set  A  can  be  computed  by  integrating  its  PDF  over  A,  as  follows: 


P[X  e  A] 


Px(x)dx 


(5.16) 


The  PDF  must  integrate  to  one  over  the  real  line,  since  any  value  taken  by  X  falls  within  this 
interval: 


px{x)dx 


1 


Notation:  We  use  the  notation  Px(x)  to  denote  the  density  of  a  random  variable  X,  evaluated 
at  the  point  x.  Thus,  the  argument  of  the  density  is  a  dummy  variable,  and  could  be  denoted 
by  some  other  letter:  for  example,  we  could  use  the  notation  Px(u)  as  notation  for  the  density 
of  X,  evaluated  at  the  point  u.  Once  we  firmly  establish  these  concepts,  however,  we  plan  to 
allow  ourselves  to  get  sloppy.  As  discussed  in  the  note  at  the  end  of  Section  5.3,  if  there  is  no 
scope  for  confusion,  we  plan  to  use  the  dummy  variable  to  also  denote  the  random  variable  we 
are  talking  about.  For  example,  we  use  p(x)  as  the  notation  for  px(x)  and  p(y)  as  the  notation 
for  Py{p)-  But  for  now,  we  retain  the  subscripts  in  the  introductory  material  in  Sections  5.2  and 
5.3. 


Density:  We  use  the  generic  term  “density”  to  refer  to  both  PDF  and  PMF  (but  more  often 
the  PDF),  relying  on  the  context  to  clarify  what  we  mean  by  the  term. 

The  PMF  or  PDF  cannot  be  used  to  describe  mixed  random  variables  that  are  neither  discrete 
nor  continuous.  We  can  get  around  this  problem  by  allowing  PDFs  to  contain  impulses,  but  a 
general  description  of  the  probability  distribution  of  any  random  variable,  whether  it  is  discrete, 
continuous  or  mixed,  can  be  provided  in  terms  of  its  cumulative  distribution  function,  defined 
below. 


Cumulative  distribution  function  (CDF):  The  CDF  of  a  random  variable  X  is  defined  as 


Fx (x)  =  P[X  <  x] 


and  has  the  following  general  properties: 
(1)  Fx(x)  is  nondecreasing  in  x. 
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This  is  because,  for  x\  <  X2,  we  have  {A"  <  x{\  C  {A  <  X2},  so  that  P[ X  <  x\]  <  P[ X  <  x?\. 

(2)  Fx(— 00)  =  0  and  Fx(o o)  =  1. 

The  event  {A  <  — cx)}  contains  no  allowable  values  for  X,  and  is  therefore  the  empty  set,  which 
has  probabilt.y  zero.  The  event  {X  <  00}  contains  all  allowable  values  for  X ,  and  is  therefore 
the  entire  sample  space,  which  has  probabilty  one. 

(3)  Fx(x)  is  right-continuous:  Fx(x)  =  lim^0,<5>o  Fx(x+5).  Denoting  this  right  limit  as  Fx(x+), 
and  can  state  the  property  compactly  as  Fx(x)  =  Fx(x+). 

The  proof  is  omitted,  since  it  requires  going  into  probability  theory  at  a  depth  that  is  unnecessary 
for  our  purpose. 

Any  function  that  satisfies  (l)-(3)  is  a  valid  CDF.  The  CDFs  for  discrete  and  mixed  random 
variables  exhibit  jumps.  At  each  of  these  jumps,  the  left  limit  F{x~)  is  strictly  smaller  than  the 
right  limit  Fx(x+)  =  Fx(x).  Noting  that 

P[X  =  x}  =  P[ X  <  x]  -  P[X  <x}  =  Fx{x)  -  Fx(x~)  (5.17) 

we  note  that  the  jumps  correspond  to  the  discrete  set  of  points  where  nonzero  probability  mass 
is  assigned.  For  a  discrete  random  variable,  the  CDF  remains  constant  between  these  jumps. 
The  PMF  is  given  by  applying  (5.17)  for  x  —  Xj,  i  —  1,  2, ...,  where  {x*}  is  the  set  of  values  taken 
by  A. 

For  a  continuous  random  variable,  there  are  no  jumps  in  the  CDF,  since  P[ X  —  x]  —  0  for  all  x. 
That  is,  a  continuous  random  variable  can  be  defined  as  one  whose  CDF  is  a  continuous  function. 
From  the  definition  (5.15)  of  PDF,  it  is  clear  that  the  PDF  of  a  continuous  random  variable  is 
the  derivative  of  the  CDF;  that  is, 

Px(x)  =  F'x(x)  (5.18) 

Actually,  it  is  possible  that  the  derivative  of  the  CDF  for  a  continuous  random  variable  does  not 
exist  at  certain  points  (i.e.,  when  the  slopes  of  Fx(x)  approaching  from  the  left  and  the  right  are 
different).  The  PDF  at  these  points  can  be  defined  as  either  the  left  or  the  right  slope;  it  does  not 
make  a  difference  in  our  probability  computations,  which  involving  integrating  the  PDF  (which 
washes  away  the  effect  of  individual  points).  We  therefore  do  not  worry  about  this  technicality 
any  further. 

We  obtain  the  CDF  from  the  PDF  by  integrating  the  relationship  (5.18): 

Fx{x)  =  t  Px(z)  dz  (5.19) 

J  —OO 


It  is  also  useful  to  define  the  complementary  CDF. 

Complementary  cumulative  distribution  function  (CCDF):  The  CCDF  of  a  random 
variable  A"  is  defined  as 

Fcx(x)  =  P[X  >  x]  =  1  -  Fx(x) 

The  CCDF  is  often  useful  in  talking  about  tail  probabilities  (e.g.,  the  probability  that  a  noise 
sample  takes  a  large  value,  causing  an  error  at  the  receiver).  For  a  continuous  random  variable 
with  PDF  px{x),  the  CCDF  is  given  by 

POO 

Fx(x)  =  /  px(z )  dz  (5.20) 

J  X 


We  now  list  a  few  more  commonly  encountered  random  variables. 

Exponential  random  variable:  The  random  variable  A  has  an  exponential  distribution  with 
parameter  A,  which  we  denote  as  A"  ~  Exp( A),  if  its  PDF  is  given  by 


Px  (x) 


Xe  Xx,  x  >  0 
0,  x  <  0 
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Figure  5.4:  PDF  of  an  exponential  random  variable  with  parameter  A  =  1/5  (or  mean  j  —  5). 

See  Figure  5.4  for  an  example  PDF.  We  can  write  this  more  compactly  using  the  indicator 
function: 

Px(x)  =  \e~XxI[0iOo)(x) 

The  CDF  is  given  by 

Fx(x)  =  (1  -  e~Xx)I[ o,oo)(x) 

For  x  >  0,  the  CCDF  is  given  by 

Fcx(x)  =  P[X>x\  =  e~Xx 

That  is,  the  tail  of  an  exponential  distribution  decays  (as  befits  its  name)  exponentially. 


Figure  5.5:  PDF  of  a  Gaussian  random  variable  with  parameters  m  —  5  and  v 2  =  16.  Note  the 
bell  shape  for  the  Gaussian  density,  with  peak  around  its  mean  m  =  5 
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Gaussian  (or  normal)  random  variable:  The  random  variable  X  has  a  Gaussian  distribution 
with  parameters  m  and  v 2  if  its  PDF  is  given  by 

.  .  1  fix  —  m)2\ 

Pl(l)  =  V2rfeXP(‘~)  (6'21) 

See  Figure  5.5  for  an  example  PDF.  As  we  show  in  Section  5.5,  m  is  the  mean  of  X  and  v2  is 
its  variance.  The  PDF  of  a  Gaussian  has  a  well-known  bell  shape,  as  shown  in  Figure  5.5.  The 
Gaussian  random  variable  plays  a  very  important  role  in  communication  system  design,  hence 
we  discuss  it  in  far  more  detail  in  Section  5.6,  as  a  prerequisite  for  the  receiver  design  principles 
to  be  developed  in  Chapter  6. 


Example  5.2.1 

PDF 


(Recognizing  a  Gaussian  density):  Suppose  that  a  random  variable  X  has 


px(x)  =  ce~2x2+x 


where  c  is  an  unknown  constant,  and  x  ranges  over  the  real  line.  Specify  the  distribution  of  X 
and  write  down  its  PDF. 

Solution:  Any  PDF  with  an  exponential  dependence  on  a  quadratic  can  be  put  in  the  form  (5.21) 
by  completing  squares  in  the  exponent. 


— 2x2  +  x  =  —  2(x2  —  x/2) 


-2 


1 

16 


Comparing  with  (5.21),  we  see  that  the  PDF  can  be  written  as  an  N(m,v 2)  PDF  with  m  =  \ 
and  A?  =  2,  so  that  v2  =  Thus,  X  ~  N(j,  |)  and  its  PDF  is  given  by  specializing  (5.21): 

px(x)  =  i _ e-^-D2  =  e~2x2+x~l 

V27T/4 

We  usually  do  not  really  care  about  going  back  and  specifying  the  constant  c,  since  we  already 
know  the  form  of  the  density.  But  it  is  easy  to  check  that  c  =  \j2j'Ke~& . 


Binomial  random  variable:  We  say  that  a  random  variable  Y  has  a  binomial  distribution 
with  parameters  n  and  p,  and  denote  this  by  Y  ~  Bin(n,p),  if  Y  takes  integer  values  0, 1, 
with  probability  mass  function 

Pt  =  P[Y  =  *]=(£  V(1  -  ?)”"*  ■  *  =  0, 1,  n 

Recall  that  ”n  choose  k”  (the  number  of  ways  in  which  we  can  choose  k  items  out  of  n  identical 
items,  is  given  by  the  expression 

(  n  \  _  n\ 

\k  )  ~  k\{n  —  k)\ 

with  k\  =  1  x  2  x  ...  x  k  denoting  the  factorial  operation.  The  binomial  distribution  can  be 
thought  of  a  discrete  time  analogue  of  the  Gaussian  distribution;  as  seen  in  Figure  5.6,  the  PMF 
has  a  bell  shape.  We  comment  in  more  detail  on  this  when  we  discuss  the  central  limit  theorem 
in  Appendix  5.B. 

Poisson  random  variable:  X  is  a  Poisson  random  variable  with  parameter  A  >  0  if  it  takes 
values  from  the  nonnegative  integers,  with  pmf  given  by 

P[X  =  k}  =  ^e-\  k  =  0,1,2,... 

As  shown  later,  the  parameter  A  equals  the  mean  of  the  Poisson  random  variable. 
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0.2 


Figure  5.6:  PMF  of  a  binomial  random  variable  with  n  =  20  and  p  =  0.3. 

5.3  Multiple  Random  Variables,  or  Random  Vectors 


Figure  5.7:  Multiple  random  variables  defined  on  a  common  probability  space. 


We  are  often  interested  in  more  than  one  random  variable  when  modeling  a  particular  scenario 
of  interest.  For  example,  a  model  of  a  received  sample  in  a  communication  link  may  involve  a 
randomly  chosen  transmitted  bit,  a  random  channel  gain,  and  a  random  noise  sample.  In  general, 
we  are  interested  in  multiple  random  variables  defined  on  a  “common  probability  space,”  where 
the  latter  phrase  means  simply  that  we  can,  in  principle,  compute  the  probability  of  events 
involving  all  of  these  random  variables.  Technically,  multiple  random  variables  on  a  common 
probability  space  are  simply  different  mappings  from  the  sample  space  to  the  real  line,  as 
depicted  in  Figure  5.7.  However,  in  practice,  we  do  not  usually  worry  about  the  underlying 
sample  space  (which  can  be  very  complicated),  and  simply  specify  the  joint  distribution  of  these 
random  variables,  which  provides  information  sufficient  to  compute  the  probabilities  of  events 
involving  these  random  variables. 

In  the  following,  suppose  that  Xi, ...,  Xn  are  random  variables  defined  on  a  common  probability 
space;  we  can  also  represent  them  as  an  n-dimensional  random  vector  X  =  (Xi,  ...,Xn)T. 

Joint  Cumulative  Distribution  Function:  The  joint  CDF  is  defined  as 

Fx(x)  =  FXl,...,xn{x i,  -,xn)  =  P[X i  <  xi,  ...,Xn  <  xn] 
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Joint  Probability  Density  Function:  When  the  joint  CDF  is  continuous,  we  can  define  the 
joint  PDF  as  follows: 

.  .  d  d  .  . 

Px(x)  —  pXl,...,X„{Xi,  ...,Xn)  =  faT-faTF; x1,...,xn{xi:  ...,xn) 

We  can  recover  the  joint  CDF  from  the  joint  PDF  by  integrating: 

/X\  pXn 

•  ••  /  pXl,...,xn(ui,  ...:un)du1...dun 

-OO  J  —  OO 

The  joint  PDF  must  be  nonnegative  and  must  integrate  to  one  over  n-dimensional  space.  The 
probability  of  a  particular  subset  of  n-dimensional  space  is  obtained  by  integrating  the  joint  PDF 
over  the  subset. 

Joint  Probability  Mass  Function  (PMF):  For  discrete  random  variables,  the  joint  PMF  is 
defined  as 

Px(x)  =  pXl,...,xn(xi,  ...,xn)  =  P[X1  =  X\ ,  ...,Xn  =  xn] 

Marginal  distributions:  The  marginal  distribution  for  a  given  random  variable  (or  set  of 
random  variables)  can  be  obtained  by  integrating  or  summing  over  all  possible  values  of  the 
random  variables  that  we  are  not  interested  in.  For  CDFs,  this  simply  corresponds  to  setting 
the  appropriate  arguments  in  the  joint  CDF  to  infinity.  For  example, 

Fx(x)  =  P[ X  <  x]  =  P[X  <  x,  Y  <  oo]  =  FX,y(x >  °°) 

For  continuous  random  variables,  the  marginal  PDF  is  obtained  from  the  joint  PDF  by  “inte¬ 
grating  out”  the  undesired  random  variable: 

/OO 

Px,y(x ,  y)dy  ,  -  oo  <  x  <  oo 

-OO 

For  discrete  random  variables,  we  sum  over  the  possible  values  of  the  undesired  random  variable: 

Px(x)  =  Pxx{x,y)  ,  xeX 
y^y 

where  X  and  y  denote  the  set  of  possible  values  taken  by  X  and  Y,  respectively. 


Example  5.3.1  (Joint  and  marginal  densities):  Random  variables  X  and  Y  have  joint 
density  given  by 

{c  xy,  0  <  x,  y  <  1 
2  cxy,  -l<x,y<0 
0,  else 

where  the  constant  c  is  not  specified. 

(a)  Find  the  value  of  c. 

(b)  Find  P[X  +  Y  <  1], 

(c)  Specify  the  marginal  distribution  of  X. 

Solution: 

(a)  We  find  the  constant  using  the  observation  that  the  joint  density  must  integrate  to  one: 

1  =  f  fpx,Y(x,y)  dx  dy 

—  c  Jo  ,/o  xy  dx  dy  +  J_1  j_x  xy  dx  dy 


1  2 

+  2c  \ 
o  A 


-l 


-l 


3c/4 
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Thus,  c  =  4/3. 

(b)  The  required  probability  is  obtained  by  integrating  the  joint  density  over  the  shaded  area  in 
Figure  5.8.  We  obtain 


P[X  +  Y  <  1]  =  fy=0  cxydxdy  +  /y°=_1  J r°=_1  2 cdxdy 

=  cfu= o  (f  „ vdy  +  2ct 


=  c  fy—Q  y^f-dy  +  2c/4  =  c/24  +  c/2  =  13c/24 
=  13/18 


We  could  have  computed  this  probability  more  quickly  in  this  example  by  integrating  the  joint 
density  over  the  unshaded  area  to  find  P[X  +  Y  >  1],  since  this  area  has  a  simpler  shape: 


P[X  +  Y  >  1]  =  fyl0  fj=1_y  cxydxdy  =  c  fy=0 
=  (c/2)  fy=0y(2y  -  y2)dy  =  5c/24  =  5/18 


ydy 


from  which  we  get  that  P[X  +  Y  <  1]  =  1  —  P[ X  +  Y  >  1]  =  13/18. 

(c)  The  marginal  density  of  X  is  found  by  integrating  the  joint  density  over  all  possible  values 
of  Y.  For  0  <  x  <  1,  we  obtain 


Vx{x)  =  c  xy  dy  =  c  x- 


y= o 


=  c  x/2  =  2x/3 


(5.22) 


For  —  1  <  x  <  0,  we  have 


Px{x)  =  2c  xy  dy  =  2c  x 


y 


y= o 


=  —  c  x  =  —Ax/3 


Conditional  density:  The  conditional  density  of  Y  given  X  is  defined  as 

px,y(xi  y) 


PY\x(y\x )  = 


Px(x) 


(5.23) 


(5.24) 


where  the  definition  applies  for  both  PDFs  and  PMFs,  and  where  we  are  interested  in  values  of 
x  such  that  px(x)  >  0.  For  jointly  continuous  X  and  Y,  the  conditional  density  p(y\x)  has  the 
interpretation 

pY\x(y\x)Ay  ~  P  [Y  e[y,y  +  Ay]\X  e  [x,x  +  Ax}]  (5.25) 
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for  Ax,  Ay  small.  For  discrete  random  variables,  the  conditional  pmf  is  simply  the  following 
conditional  probability: 

Py\x(v\x)  =  P[Y  =  y\X  =  x ]  (5.26) 


Example  5.3.2  Continuing  with  Example  5.3.1,  let  us  find  the  conditional  density  of  Y  given 
X.  For  X  —  x  G  [0, 1],  we  have  Px.y(x,  y)  =  c  xy,  with  0  <  y  <  1  (the  joint  density  is  zero  for 
other  values  of  y,  under  this  conditioning  on  X).  Applying  (5.24),  and  substituting  (5.22),  we 
obtain 

/  |  \  Px,y(x,  y)  cxy 

PY\x{y\x )  =  ^  =2y  ,  0  <  y  <  1  (for  0  <  x  <  1) 

Similarly,  for  X  —  xG  [—1,0],  we  obtain,  using  (5.23),  that 
/  |  \  Px,y(x,  y)  2 cxy 

PY\x{y\x)  = - =  - =  -2 y  ,  -  1  <  y  <  0  (for  —  1  <  x  <  0) 

Px\x )  — cx 

We  can  now  compute  conditional  probabilities  using  the  preceding  conditional  densities.  For 
example, 

/— °-5  -0.5 

P[Y  <  -0.5|X  =  -0.5]  =  J  (-2 y)dy  = -y2  ^  =  3/4 
whereas  P[Y  <  0.5|X  =  -0.5]  =  1  (why?). 


Bayes’  rule  for  conditional  densities:  Given  the  conditional  density  of  Y  given  X ,  the 
conditional  density  for  X  given  Y  is  given  by 


Px\v(x\y) 
Px\v(x\y ) 


PY\x(y\x)px(x) 

pv(y) 

PY\x{y\x)px(x ) 
py{y) 


PY\x(y\x)px{x) 

f PY\x(y\x)Px{x)dx  i 

PY\xi,y\x)px(x) 

T,xPY\x(y\x)px(x)  5 


continuous  random  variables 
discrete  random  variables 


We  can  also  mix  discrete  and  continuous  random  variables  in  applying  Bayes’  rule,  as  illustrated 
in  the  following  example. 


Example  5.3.3  (Conditional  probability  and  Bayes’  rule  with  discrete  and  continuous 
random  variables)  A  bit  sent  by  a  transmitter  is  modeled  as  a  random  variable  X  taking  values 
0  and  1  with  equal  probability.  The  corresponding  observation  at  the  receiver  is  modeled  by  a 
real- valued  random  variable  Y.  The  conditional  distribution  of  Y  given  X  =  0  is  iV(0, 4).  The 
conditional  distribution  of  Y  given  X  =  1  is  A”(10,4).  This  might  happen,  for  example,  with 
on-off  signaling,  where  we  send  a  signal  to  send  1,  and  send  nothing  when  we  want  to  send  0. 
The  receiver  therefore  sees  signal  plus  noise  if  1  is  sent,  and  sees  only  noise  if  0  is  sent,  and  the 
observation  Y,  presumably  obtained  by  processing  the  received  signal,  has  zero  mean  if  0  is  sent, 
and  nonzero  mean  if  1  is  sent. 

(a)  Write  down  the  conditional  densities  of  Y  given  X  =  0  and  X  —  1,  respectively. 

(b)  Find  P[Y  =  7\X  =  0],  P[Y  =  7\X  =  1]  and  P[Y  =  7], 

(c)  Find  P[Y  >  7\X  —  0]. 

(d)  Find  P[Y  >  7\X  =  1], 

(e)  Find  P[X  =  Q\Y  =  7], 

Solution  to  (a):  We  simply  plug  in  numbers  into  the  expression  (5.21)  for  the  Gaussian  density 
to  obtain: 


p(y\x  =  0) 


p(y\x  =  1  )dy 


1  r-(y- io)2/8 
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Solution  to  (b):  Conditioned  on  X  =  0,  Y  is  a  continuous  random  variable,  so  the  probability 
of  taking  a  particular  value  is  zero.  Thus,  P[Y  —  7\X  —  0]  =  0.  By  the  same  reasoning, 
P[Y  =  7\X  =  1]  =  0.  The  unconditional  probability  is  given  by  the  law  of  total  probability: 

P[Y  =  7}  =  P[Y  =  7\X  =  0  }P[X  =  0]  +  P[Y  =  7\X  =  1  ]P[X  =  1]  =  0 


Solution  to  (c):  Finding  the  probability  of  Y  lying  in  a  region,  conditioned  on  X  =  0,  simply 
involves  integrating  the  conditional  density  over  that  region.  We  therefore  have 


P[Y  >  7\X  =  0] 


/8dy 


We  shall  see  in  Section  5.6  how  to  express  such  probabilities  involving  Gaussian  densities  in  com¬ 
pact  form  using  standard  functions  (which  can  be  evaluated  using  built-in  functions  in  Matlab), 
but  for  now,  we  leave  the  desired  probability  in  terms  of  the  integral  given  above. 

Solution  to  (d):  This  is  analogous  to  (c),  except  that  we  integrate  the  conditional  probability  of 
Y  given  X  =  1: 


roo  rcc  -t 

P[Y  >7\X  =  1)=  /  p(y\x  =  1  )dy  =  /  -^e~^-w)2/8dy 

J  7  J  7  V87T 

Solution  to  (e):  Now  we  want  to  apply  Bayes’  rule  for  find  P[X  =  §\Y  =  7}.  But  we  know  from 
(b)  that  the  event  {Y  =  7}  has  zero  probability.  How  do  we  condition  on  an  event  that  never 
happens?  The  answer  is  that  we  define  P[X  =  0|F  =  7]  to  be  the  limit  of  P[X  =  0|Fg  (7  — 
e,  7  +  e)]  as  e  — >  0.  For  any  e  >  0,  the  event  that  we  are  conditioning  on,  {Y  e  (7  —  e,  7  +  e)},  and 
we  can  show  by  methods  beyond  our  present  scope  that  one  does  get  a  well-defined  limit  as  e 
tends  to  zero.  However,  we  do  not  need  to  worry  about  such  technicalities  when  computing  this 
conditional  probability:  we  can  simply  compute  it  (for  an  arbitrary  value  of  Y  =  y)  as 


P[X  =  0\Y  =  y}  = 


py\x{y\Q)P[x  =  o] 
Pviy) 


Pv\x{y\o)P[x  =  o] 


PY\x(y\0)P[X  =  0]  +py\X{y\l)P[X  =  1] 


Substituting  the  conditional  densities  from  (a)  and  setting  P[X  =  0]  =  P[X  =  1]  =  1/2,  we 
obtain 

1  P-y2/s  i 

P[X  =  0\Y  =  y]  =  t—  2 


Plugging  in  y  =  7,  we  obtain 
which  of  course  implies  that 


Le-r78  -f.  Ie-(?/-io)2/8  x  +  e5(y-5)/2 


P[X  =  0|y  =  7]  =  0.0067 


P[X  =  l|y  =  7]  =  1  -  P[X  =  0|y  =  7]  =  0.9933 


Before  seeing  Y,  we  knew  only  that  0  or  1  were  sent  with  equal  probability.  After  seeing  Y  =  7, 
however,  our  model  tells  us  that  1  was  far  more  likely  to  have  been  sent.  This  is  of  course  what  we 
want  in  a  reliable  communication  system:  we  begin  by  not  knowing  the  transmitted  information 
at  the  receiver  (otherwise  there  would  be  no  point  in  sending  it),  but  after  seeing  the  received 
signal,  we  can  infer  it  with  high  probability.  We  shall  see  many  more  such  computations  in  the 
next  chapter:  conditional  distributions  and  probabilities  are  fundamental  to  principled  receiver 
design. 
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Independent  Random  Variables:  Random  variables  X1} ....  Xn  are  independent  if 

P[Xi  e  A,  xn  e  AJ  =  P[x, \  e  A1]...P[xn  e  An] 

for  any  subsets  Ai, ...,  An.  That  is,  events  defined  in  terms  of  values  taken  by  these  random  vari¬ 
ables  are  independent  of  each  other.  This  implies,  for  example,  that  the  conditional  probability 
of  an  event  defined  in  terms  of  one  of  these  random  variables,  conditioned  on  events  defined  in 
terms  of  the  other  random  variables,  equals  the  unconditional  probability: 

P[Xt  e  Ai\x2  e  A2, ..., xn  e  An]  =  p[xx  e  A] 

In  terms  of  distributions  and  densities,  independence  means  that  joint  distributions  are  products 
of  marginal  distributions,  and  joint  densities  are  products  of  marginal  densities. 

Joint  distribution  is  product  of  marginals  for  independent  random  variables:  If 

Xi, . . . ,  Xn  are  independent,  then  their  joint  CDF  is  a  product  of  the  marginal  CDFs: 

4,.*  0i,-,O  =  FXl(x1)...FXn(xn) 
and  their  joint  density  (PDF  or  PMF)  is  a  product  of  the  marginal  densities: 

Px1,...,xn(xu  ...,xn)  =  pXl(xi)...pXn(xn) 

Independent  and  identically  distributed  (i.i.d.)  random  variables:  We  are  often  inter¬ 
ested  in  collections  of  independent  random  variables  in  which  each  random  variable  has  the  same 
marginal  distribution.  We  call  such  random  variables  independent  and  identically  distributed. 

Example  5.3.4  (A  sum  of  i.i.d.  Bernoulli  random  variables  is  a  Binomial  random  variable): 
Let  Xi,  ...,Xn  denote  i.i.d.  Bernoulli  random  variables  with  P[X i  =  1]  =  1  —  P[X\  =  0]  =  p,  and 
let  Y  =  Xi+...+Xn  denote  their  sum.  We  could  think  of  Xt  denoting  whether  the  ith  coin  flip  (of 
a  possibly  biased  coin,  if  |)  yield  heads,  where  successive  flips  have  independent  outcomes, 
so  that  Y  is  the  number  of  heads  obtained  in  n  flips.  For  communications  applications,  Xt  could 
denote  whether  the  it\i  bit  in  a  sequence  of  n  bits  is  incorrectly  received,  with  successive  bit 
errors  modeled  as  independent,  so  that  Y  is  the  total  number  of  bit  errors.  The  random  variable 
Y  takes  discrete  values  in  {0, 1,  ...,n}.  Its  PMF  is  given  by 

P[Y  =  *]=(")  pk(  1  -  p)n~k  ,  k  =  0, 1, ...,  n 

That  is,  Y  ~  Bin(n,p).  To  see  why,  note  that  Y  =  k  requires  that  exactly  k  of  the  {Xi}  take 
value  1,  with  the  remaining  n  —  k  taking  value  0.  Let  us  compute  the  probability  of  one  such 
outcome,  {X1  =  1,  ...,Xk  =  l,Xk+1  =  0,  ...,Xn  =  0}: 

P[X,  =  1,  ...,Xk  =  l,Xk+1  =  0,  ...,Xn  =  0]  =  P[Xl  =  l]...P[Xk  =  1  }P[Xk+1  =  0 }...P[Xn  =  0] 

_  pk(\  —  p^ri-k 

Clearly,  any  other  outcome  with  exactly  k  ones  has  the  same  probability,  given  the  i.i.d.  nature 
of  the  {Xi}.  We  can  now  sum  over  the  probabilities  of  these  mutually  exclusive  events,  noting 
that  there  are  exactly  “n  choose  k”  such  outcomes  (the  number  of  ways  in  which  we  can  choose 
the  k  random  variables  { X ,}  which  take  the  value  one)  to  obtain  the  desired  PMF. 

Density  of  sum  of  independent  random  variables:  Suppose  that  X\  and  X2  are  indepen¬ 
dent  continuous  random  variables,  and  let  Y  =  Xi  +  X2.  Then  the  PDF  of  Y  is  a  convolution 
of  the  PDFs  of  Xx  and  X2: 

/OO 

pXl{xi)Px2(y  ~  xi)  dx  1 

-00 

For  discrete  random  variables,  the  same  result  holds,  except  that  the  PMF  is  given  by  a  discrete¬ 
time  convolution  of  the  PMFs  of  X±  and  X2. 
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Figure  5.9:  The  sum  of  two  independent  uniform  random  variables  has  a  PDF  with  trapezoidal 
shape,  obtained  by  convolving  two  boxcar-shaped  PDFs. 


Example  5.3.5  (Sum  of  two  uniform  random  variables)  Suppose  that  A"!  is  uniformly 

distributed  over  [0,1],  and  X2  is  uniformly  distributed  over  [—1,1],  Then  Y  =  Xi  +  X2  takes 
values  in  the  interval  [—1,2],  and  its  density  is  the  convolution  shown  in  Figure  5.9. 


Of  particular  interest  to  us  are  jointly  Gaussian  random  variables,  which  we  discuss  in  more 
detail  in  Section  5.6. 

Notational  simplification:  In  the  preceding  definitions,  we  have  distinguished  between  differ¬ 
ent  random  variables  by  using  subscripts.  For  example,  the  joint  density  of  X  and  Y  is  denoted 
by  Px,y{xiU ),  where  A",  Y  denote  the  random  variables,  and  x,  y,  are  dummy  variables  that  we 
might,  for  example,  integrate  over  when  evaluating  a  probability.  We  could  easily  use  some  other 
notation  for  the  dummy  variables,  e.g.,  the  joint  density  could  be  denoted  as  Px,y(u,v).  After 
all,  we  know  that  we  are  talking  about  the  joint  density  of  X  and  Y  because  of  the  subscripts. 
However,  carrying  around  the  subscripts  is  cumbersome.  Therefore,  from  now  on,  when  there 
is  no  scope  for  confusion,  we  drop  the  subscripts  and  use  the  dummy  variables  to  also  denote 
the  random  variables  we  are  talking  about.  For  example,  we  now  use  p(x,y)  as  shorthand  for 
Px,y(xiV)i  choosing  the  dummy  variables  to  be  lower  case  versions  of  the  random  variables  they 
are  associated  with.  Similarly,  we  use  p{x)  to  denote  the  density  of  A,  p(y)  to  denote  the  density 
of  Y,  and  p(y\x)  to  denote  the  conditional  density  of  Y  given  X.  Of  course,  we  revert  to  the 
subscript-based  notation  whenever  there  is  any  possibility  of  confusion. 


5.4  Functions  of  random  variables 


Figure  5.10:  A  function  of  a  random  variable  is  also  a  random  variable. 


We  review  here  methods  of  determining  the  distribution  of  functions  of  random  variables.  If 
X  =  X (oj)  is  a  random  variable,  so  is  Y(oj)  =  g(X(u>)),  since  it  is  a  mapping  from  the  sample 
space  to  the  real  line  which  is  a  composition  of  the  original  mapping  X  and  the  function  g,  as 
shown  in  Figure  5.10. 

Method  1  (find  the  CDF  first):  We  proceed  from  definition  to  find  the  CDF  of  Y  —  g(X) 
as  follows: 

FY(y)  =  P[Y  <y]  =  P[g(X)  <  y]  =  P{ X  e  A{y)\ 


214 


where  A(y)  =  {x  :  g{x)  <  y}.  We  can  now  use  the  CDF  or  density  of  X  to  evaluate  the  extreme 
right-hand  side.  Once  we  find  the  CDF  of  Y,  we  can  find  the  PMF  or  PDF  as  usual. 


y  =  x-2 


Range  of  X  corresponding 
to  Y  <=  y 

Figure  5.11:  Finding  the  CDF  of  Y  —  X'2. 


Example  5.4.1  (Application  of  Method  1)  Suppose  that  X  is  a  Laplacian  random  variable 
with  density 

Px(x)  =  ^ e"N 

Find  the  CDF  and  PDF  of  Y  =  X2. 

In  method  1,  we  find  the  CDF  of  Y  first,  and  then  differentiate  to  find  the  PDF.  First,  note  that 
Y  takes  only  nonnegative  values,  so  that  Fy(y)  =  0  for  y  <  0.  For  y  >  0,  we  have 

Fy(y)  =  P[Y  <y\=  P[X2  <y}  =  P[-^y  <  X  <  Jy\ 

=  I^yPx(x)dx  =  \e~\x\dx  =  e~xdx 

=  1  -  e~Fv  ,  y  >  0 


We  can  now  differentiate  the  CDF  to  obtain  the  PDF  of  Y: 


Pr(y) 


dy 


Fy(y) 


e-Yv 


y>  o 


(The  CDF  and  PDF  are  zero  for  y  <  0,  since  Y  only  takes  nonnegative  values.) 
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Method  2  (find  the  PDF  directly):  For  differentiable  g(x)  and  continuous  random  variables, 
we  can  compute  the  PDF  directly.  Suppose  that  g{x)  —  y  is  satisfied  for  x  =  x\ We 
can  then  express  X;  as  a  function  of  y:  Xi  =  h^y)  For  g{x)  =  x2,  this  corresponds  to  X\  =  sjy 
and  X2  =  —  y fy.  The  probability  of  X  lying  in  a  small  interval  [x;,x;  +  Ax]  is  approximately 
px(xi)Ax,  where  we  take  the  increment  Ax  >  0.  For  smooth  g,  this  corresponds  to  Y  lying  in  a 
small  interval  around  y,  where  we  need  to  sum  up  the  probabilities  corresponding  to  all  possible 
values  of  x  that  get  us  near  the  desired  value  of  y.  We  therefore  get 


m 

Py{y)\Ay\  =  ^2px(xi)  Ax 
1=1 


where  we  take  the  magnitude  of  the  Y  increment  Ay  because  a  positive  increment  in  x  can  cause 
a  positive  or  negative  increment  in  g(x),  depending  on  the  slope  at  that  point.  We  therefore 
obtain 


Pr(y)  = 

i= 1 


PxiXi) 


\dyjdx 


Xi=hi(y) 


(5.27) 


We  now  redo  Example  5.4.1  using  Method  2. 


Example  5.4.2  (application  of  Method  2)  For  the  setting  of  Example  5.4.1,  we  wish  to  find 
the  PDF  using  Method  2.  For  y  =  g(x)  =  x2,  we  have  x  =  ±yd/  (we  only  consider  y  >  0,  since 
the  PDF  is  zero  for  y  <  0),  with  derivative  dy/dx  =  2x.  We  can  now  apply  (5.27)  to  obtain: 


Py(v) 


Px(y/y)  Px(-y/y) 

fiVvl  \~2Vy\ 


e-Vv 


y>  o 


as  before. 


Since  Method  1  starts  from  the  definition  of  CDF,  it  generalizes  to  multiple  random  variables 
(i.e. ,  random  vectors)  in  a  straightforward  manner,  at  least  in  principle.  For  example,  suppose 
that  Y\  =  g\ (X\,  X2)  and  Y2  =  g2(X\,  X2).  Then  the  joint  CDF  of  Y\  and  Y2  is  given  by 

FYl,Y2(yi,y2)  =  P[Yi  <  yi,Y2  <  y2]  =  P\g^XuX2)  <  y1,g2(X1,X2 )  <  y2]  =  P[(X1:X2)  £  A(yi,y2)] 

where  A(yi,y2)  =  {(xi,x2)  :  g±(xi,x2)  <  y±,  g2(xi,  x2)  <  y2}.  In  principle,  we  can  now  use  the 
joint  distribution  to  compute  the  preceding  probability  for  each  possible  value  of  (yi,y2)-  In 
general,  Method  1  works  for  Y  =  g(X),  where  Y  is  an  n-dimensional  random  vector  which  is 
a  function  of  an  m-dimensional  random  vector  X  (in  the  preceding,  we  considered  m  =  n  = 

2).  However,  evaluating  probabilities  involving  m-dimensional  random  vectors  can  get  pretty 
complicated  even  for  m  =  2.  A  generalization  of  Method  2  is  often  preferred  as  a  way  of  directly 
obtaining  PDFs  when  the  functions  involved  are  smooth  enough,  and  when  m  =  n.  We  review 
this  next. 

Method  2  for  random  vectors:  Suppose  that  Y  =  (Yi, ...,  Yn)T  is  an  n  x  1  random  vector 
which  is  a  function  of  another  n  x  1  vector  X  =  (X\, Xn)T .  That  is,  Y  =  g(X),  or  = 
f!k{X\ ,  ...,Xn),  k  —  1, ..,  n.  As  before,  suppose  that  y  =  g(x)  has  m  solutions,  Xj , ...,  xm,  with  the 
ith  solution  written  in  terms  of  y  as  x;  =  lfi(y).  The  probability  of  Y  lying  in  an  infinitesimal 
volume  is  now  given  by 

m 

pv{y)  My  I  =  X^x(Xi)  lrfxl 

i=  1 
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In  order  to  relate  the  lengths  of  the  vector  increments  \dy\  and  |dx|,  it  no  longer  suffices  to 
consider  a  scalar  derivative.  We  now  need  the  Jacobian  matrix  of  partial  derivatives  of  y  =  g(x) 
with  respect  to  x,  defined  as: 


J(y;x) 


/  dv  i 


dyn 

\  dx\ 


The  lengths  of  the  vector  increments  are  related  as 


dyi  \ 

&Xn 


dyn 

dxn  / 


(5.28) 


My  I  =  I det  (J(y;x))  ||dx 


where  det  (M)  denotes  the  determinant  of  a  square  matrix  M.  Thus,  if  y  =  g(x)  has  m  solutions, 
xi,...,xm,  with  the  it\i  solution  written  in  terms  of  y  as  x,;  =  h*(y),  then  the  density  at  y  is 
given  by 


My)  =  X! 

i= 1 


Px(Xj) 

|det(J(y;x))| 


xi=hi(y) 


(5.29) 


Depending  on  how  the  functional  relationship  between  X  and  Y  is  specified,  it  might  sometimes 
be  more  convenient  to  find  the  Jacobian  of  x  with  respect  to  y: 


J(x;  y) 


dx\  \ 
dyn 


dxn  .  .  .  dx„ 

\  dyi  dyn  / 


(5.30) 


We  can  use  this  in  (5.29)  by  noting  the  two  Jacobian  matrices  for  a  given  pair  of  values  (x,  y) 
are  inverses  of  each  other: 

J(x;y)  =  (J(y;x))_1 

This  implies  that  their  determinants  are  reciprocals  of  each  other: 


det(J(x;  y)) 


1 

det(J(y;x)) 


We  can  therefore  rewrite  (5.29)  as  follows: 


My)  =  ldet(J(x;y))l 

1=1 


Xi=hi(y) 


(5.31) 


Example  5.4.3  (Rectangular  to  Polar  Transformation):  For  random  variables  X\,  X 2 
with  joint  density  pxi,A'2>  think  of  (W ,  X2)  as  a  point  in  two-dimensional  space  in  Cartesian 
coordinates.  The  corresponding  polar  coordinates  are  given  by 


R  = 


$  =  tan  1 


X2 

X, 


(5.32) 


(a)  Find  the  general  expression  for  joint  density  Pr^. 

(b)  Specialize  to  a  situation  in  which  X\  and  X2  are  i.i.d.  N( 0, 1)  random  variables. 

Solution,  part  (a):  Finding  the  Jacobian  involves  taking  partial  derivatives  in  (5.32).  However, 
in  this  setting,  taking  the  Jacobian  the  other  way  around,  as  in  (5.30),  is  simpler: 


Xi  =  r  cos  (p  ,  x2  =  r  sin  </> 
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so  that 


/  dx1 

dxi 

\  / 

1  dr 

d<j> 

-  1 

1  dx2 

dx2 

y  dr 

d<j> 

J  V 

polar)) 

=  r 

(cos2  0 

cos  0  —  r  sin  0 
sin  0  r  cos 

\  (JT  Uip  /  \ 

We  see  that 

Noting  that  the  rectangnlar-polar  transformation  is  one-to-one,  we  have  from  (5.31)  that 
Pr, $ (r,  0)  =  px i ,x2  (xi ,  x2 )  | det ( J (reef;  polar))  \ 

x\  =r  cos  (j),X2=r  sin  4> 

=  rpx !,x2  ( r  cos  0,  r  sin  0)  ,  r  >  0,  0  <  0  <  27T 
Solution,  part  (b):  For  X\ ,  X2  i.i.d.  1V(0, 1),  we  have 


(5.33) 


RYiA'a^n^)  =  (.ri)7Jv2(-r2)  =  — =e  xi/2  ^=e  ^/2 

V27r  V2vr 

Plugging  into  (5.33)  and  simplifying,  we  obtain 

0)  =  77-e_r2/2  ,  r  >  0,  0  <  0  <  2vr 

Ztt 

We  can  find  the  marginal  densities  of  R  and  <f>  by  integrating  out  the  other  variable,  but  in  this 
case,  we  can  find  them  by  inspection,  since  the  joint  density  clearly  decomposes  into  a  product  of 
functions  of  r  and  0  alone.  With  appropriate  normalization,  each  of  these  functions  is  a  marginal 
density.  We  can  now  infer  that  R  and  $  are  independent,  with 

PFi(r)  =  re~r  ,  r  >  0 

and 

P *(0)  =  ^  ,  0  <  0  <  2vr 

The  amplitude  R  in  this  case  follows  a  Rayleigh  distribution,  while  the  phase  <f>  is  uniformly 
distributed  over  [0,  27t]  . 


5.5  Expectation 

We  now  discuss  computation  of  statistical  averages,  which  are  often  the  performance  measures 
based  on  which  a  system  design  is  evaluated. 

Expectation:  The  expectation,  or  statistical  average,  of  a  function  of  a  random  variable  X  is 
defined  as 

E[g(X)]  =  f  g(x)p(x)dx  ,  continuous  random  variable 

E[g(X)]  =  g(%)p{x)  ,  discrete  random  variable  ^  ’ 

Note  that  the  expectation  of  a  deterministic  constant,  therefore,  is  simply  the  constant  itself. 

Expectation  is  a  linear  operator:  We  have 

E  \a,\X i  +  02X2  +  b]  =  aiEpfi]  +  G^EjWz]  +  b 
where  «i ,  02,  b,  are  any  constants. 
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Mean:  The  mean  of  a  random  variable  X  is  E[X], 

Variance:  The  variance  of  a  random  variable  X  is  a  measure  of  how  much  it  fluctuates  around 
its  mean: 

var(X)  =  E  [(X  -  E[X])2]  (5.35) 

Expanding  out  the  square,  we  have 

var(X)  =  E  [X2  -  2XE[X]  +  (E[X])2] 

Using  the  linearity  of  expectation,  we  can  simplify  to  obtain  the  following  alternative  formula 
for  variance: 

var(X)  =  E[X2]  -  (E[X])2  (5.36) 

The  square  root  of  the  variance  is  called  the  standard  deviation. 

Effect  of  Scaling  and  Translation:  For  Y  =  aX  +  b,  it  is  left  as  an  exercise  to  show  that 

E[Y]=E[aX  +  b]  =  aE[X]  +  b  , 

var(V)  =  a2var(X) 


Normalizing  to  zero  mean  and  unit  variance:  We  can  specialize  (5.37)  to  Y 
see  that  E[Y]  =  0  and  var(V)  =  1. 


4=2£L,  to 

\/  var(X) 


Example  5.5.1  (PDF  after  scaling  and  translation):  If  X  has  density  px{x),  then  Y  = 
(X  —  a)/ b  has  density 

Pv(y)  =  \b\px(by  +  a)  (5.38) 


This  follows  from  a  straightforward  application  of  Method  2  in  Section  5.4.  Specializing  to  a 
Gaussian  random  variable  X  ~  N(m,v2)  with  mean  m  and  variance  v2  (we  review  mean  and 
variance  later),  consider  a  normalized  version  Y  =  ( X  —m)/v.  Applying  (5.38)  to  the  Gaussian 
density,  we  obtain: 


which  can  be  recognized  as  an  iV(0, 1)  density.  Thus,  if  X  r\j  N(m,  v2),  then  Y  =  ^  ~  N( 0, 1) 
is  a  standard  Gaussian  random  variable.  This  enables  us  to  express  probabilities  involving 
Gaussian  random  variables  compactly  in  terms  of  the  CDF  and  CCDF  of  a  standard  Gaussian 
random  variable,  as  we  see  later  when  we  deal  extensively  with  Gaussian  random  variables  when 
modeling  digital  communication  systems. 


Moments:  The  nth  moment  of  a  random  variable  X  is  defined  as  E[Xn],  From  (5.36),  we  see 
that  specifying  the  mean  and  variance  is  equivalent  to  specifying  the  first  and  second  moments. 
Indeed,  it  is  worth  rewriting  (5.36)  as  an  explicit  reminder  that  the  second  moment  is  the  sum 
of  the  mean  and  variance: 


E[X2]  =  (E[X])2  +  var(X) 


(5.39) 


Example  5.5.2  (Moments  of  an  exponential  random  variable):  Suppose  that  X  ~ 
Exp( A).  We  compute  its  mean  using  integration  by  parts,  as  follows: 


E[X]  =  J0°°  xXe  Xx  dx  =  —xe 


—Xx 


roo  d 

+  Jo  77xe 


d  —~Xxdx 


=  /0°°  e  Xxdx  = 


-A 


(5.40) 
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Similarly,  using  integration  by  parts  twice,  we  can  show  that 


E[X2] 


2 


Using  (5.36),  we  obtain 

var(X)  =  E[X2]  -  (E[A'])2  =  1  (5.41) 

In  general,  we  can  use  repeated  integration  by  parts  to  evaluate  higher  moments  of  the  exponential 
random  variable  to  obtain 

r°°  ,  77,1 

E [Xn]  =  J  xn\ e~Xxdx  =  —  ,  n  =  1,2,3, ... 

(A  proof  of  the  preceding  formula  using  mathematical  induction  is  left  as  an  exercise.) 


As  a  natural  follow-up  to  the  computations  in  the  preceding  example,  let  us  introduce  the  gamma 
function,  which  is  useful  for  evaluating  integrals  associated  with  expectation  computations  for 
several  important  random  variables. 

Gamma  function:  The  Gamma  function,  T(:r),  is  defined  as 

/»oo 

T(x)  =  /  ,  x  >  0 

Jo 

In  general,  integration  by  parts  can  be  used  to  show  that 


T(a;  +  1)  =  xT(x)  ,  x  >  0 


(5.42) 


Noting  that  T(l)  =  1,  we  can  now  use  induction  to  specify  the  Gamma  function  for  integer 
arguments. 

r(n)  —  (n  —  1)!  ,  n  —  1,  2,  3, ...  (5.43) 

This  is  exactly  the  same  computation  as  we  did  in  Example  5.5.2:  T(n)  equals  the  the  (n  —  l)th 
moment  of  an  exponential  random  variable  with  A  =  1  (and  hence  mean  4  =  1). 

The  Gamma  function  can  also  be  computed  for  non-integer  arguments.  Just  an  integer  arguments 
of  the  Gamma  function  are  useful  for  exponential  random  variables,  ”  integer- plus-half  ’  arguments 
are  useful  for  evaluating  the  moments  of  Gaussian  random  variables.  We  can  evaluate  these  using 
(5.42)  given  the  value  of  the  gamma  function  at  x  —  1/2. 


r (1/2) 


t  2e  t  dt  —  yfx 


For  example,  we  can  infer  that 

r(5/2)  =  (3/2)(l/2)r(l/2)  = 


(5.44) 


Example  5.5.3  (Mean  and  variance  of  a  Gaussian  random  variable):  We  now  show  that 
X  ~  N(m,  v2)  has  mean  m  and  variance  v2.  The  mean  of  X  is  given  by  the  following  expression: 


E[V 


2V2 


dx 
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Let  us  first  consider  the  change  of  variables  t  =  (x  —  m)/v,  so  that  dx  =  v  dt.  Then 

(tv  +  m)  .  ^  =e~*2//2u  dt 
v  27 tv2 

Note  that  te-*2/2  is  an  odd  function,  and  therefore  integrates  out  to  zero  over  the  real  line.  We 
therefore  obtain 


recognizing  that  the  integral  on  the  extreme  right-hand  side  is  the  iV(0, 1)  PDF,  which  must 
integrate  to  one.  The  variance  is  given  by 

1  ( x-rn )2 

,  e  ^  dx 
V2nv2 

With  a  change  of  variables  t  =  (x  —  m)/v  as  before,  we  obtain 

/OO  -I  /‘OO  -j 

t 2  e~t2//2  dt  =  2v2  /  t 2  e~*2|/2  dt 

-oo  Jo 

since  the  integrand  is  an  even  function  of  t.  Substituting  z  =  t2  / 2,  so  that  cfc  =  tdt  =  \/2zdt, 
we  obtain 

var(W)  =  2v2  f0°°  2 =  2v2^  JQ°°  z^e- 
=  2u2^T(3/2)  =  u2 

since  T(3/2)  =  (l/2)T(l/2)  =  0F/2. 

The  change  of  variables  in  the  computations  in  the  preceding  example  is  actually  equivalent 
to  transforming  the  N(m,v2)  random  variable  that  we  started  with  to  a  standard  Gaussian 
iV(0, 1)  random  variable  as  in  Example  5.5.1.  As  we  mentioned  earlier  (this  is  important  enough 
to  be  worth  repeating),  when  we  handle  Gaussian  random  variables  more  extensively  in  later 
chapters,  we  prefer  making  the  transformation  up  front  when  computing  probabilities,  rather 
than  changing  variables  inside  integrals. 

As  a  final  example,  we  show  that  the  mean  of  a  Poisson  random  variable  with  parameter  A  is 
equal  to  A. 


var(X)  =  E[(X  —  m)2)  —  I  (x  —  m) 


Example  5.5.4  (Mean  of  a  Poisson  random  variable):  The  mean  is  given  by 

OO  OO  » 

EM  =  Y,  kP\X  =  k]  =  J2  kW ’~X 

k= 0  k= 1 

where  we  have  dropped  the  k  —  0  term  from  the  extreme  right  hand  side,  since  it  does  not 
contribute  to  the  mean.  Noting  that  ^  ;  we  have 


Xk 


e~x  = 


00  \fc-i 

Ae"A  V  - - -  =  A 


since 


oo  \  k  —  1  00 


=  e 


^(k-  1)! 

k= 1  v  '  1=0 

where  we  set  l  =  k  —  1  to  get  an  easily  recognized  form  for  the  series  expansion  of  an  exponential. 
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5.5.1  Expectation  for  random  vectors 

So  far,  we  have  talked  about  expectations  involving  a  single  random  variable.  Expectations 
with  multiple  random  variables  are  defined  in  exactly  the  same  way:  as  in  (5.34),  replacing  the 
scalar  random  variable  and  the  corresponding  dummy  variable  for  summation  or  integration  by 
a  vector. 

E[g(X)]  =  E[g(X1; ...,Xn)]  =  f  g(x)p(x )  dx  ,  jointly  continuous  random  variables 

=  f™=_oog(x1,...,xn)p(x1,...,xn)  dxi...dxn 


E[g(X)]  =  E[g(Ad,  ...,Xn)]  =  J^xg(x)p(x)  ,  discrete  random  variables 
=  EX1  -'Exn9{xi,-,xn)p(x1,...,xn) 

(5.45) 

Product  of  expectations  for  independent  random  variables:  When  the  random  variables 
involved  are  independent,  and  the  function  whose  expectation  is  to  be  evaluated  decomposes 
into  a  product  of  functions  of  each  individual  random  variable,  then  the  preceding  computation 
involves  a  product  of  expectations,  each  involving  only  one  random  variable: 

Wygl(X1)...gn(Xn)]=¥\gl(Xl)]..M{gn(Xn)]  ,  Xu  ...,  Xn  independent  (5.46) 

Example  5.5.5  (Computing  an  expectation  involving  independent  random  variables): 

Suppose  that  Xx  ~  N(l,  1)  and  X2  ~  N(— 3,  4)  are  independent.  Find  E[(Ad  +  X2)2]. 

Solution:  We  have 

E[(Xx  +  X2)2]  =  E[X2  +  X\  +  2X1X2] 

We  can  now  use  linearity  to  compute  the  expectations  of  each  of  the  three  terms  on  the  right- 
hand  side  separately.  We  obtain  E[A"2]  =  (E[Xi])2  +  var(Xi)  =  l2  +  1  =  2,  E[Xf]  =  (E[X2])2  + 
var(X2)  =  (— 3)2  +  4  =  13,  and  E[2AdX2]  =  2E[X1]E[X2]  =  2(1) (—3)  =  —6,  so  that 

E[(AX  +  A2)2]  =  2  +  13  -  6  =  9 

Variance  is  a  measure  of  how  a  random  variable  fluctuates  around  its  means.  Covariance,  defined 
next,  is  a  measure  of  how  the  fluctuations  of  two  random  variables  around  their  means  are 
correlated. 

Covariance:  The  covariance  of  Ad  and  X2  is  defined  as 

cov(A1,  X2)  =  E  [(Ax  -  E[Xi])  (X2  -  E[X2])]  (5.47) 

As  with  variance,  we  can  also  obtain  the  following  alternative  formula: 

cov(Ah,  X2)  =  K[XiX2]  -  E[X!]E[X2]  (5.48) 

Variance  is  the  covariance  of  a  random  variable  with  itself:  It  is  immediate  from  the 
definition  that 

var(A)  =  cov(A",  A") 

Uncorrelated  random  variables:  Random  variables  Ad  and  X2  are  said  to  be  uncorrelated 
if  cov(X!,X2)  =  0. 

Independent  random  variables  are  uncorrelated:  If  Xi  and  X2  are  independent,  then  they 
are  uncorrelated. 

This  is  easy  to  see  from  (5.48),  since  E[AdX2]  =  E[Ah]E[X2]  using  (5.46). 


Uncorrelated  random  variables  need  not  be  independent:  Consider  X  ~  iV(0, 1)  and 
Y  =  X2.  We  see  that  that  E[J7]  =  E[X3]  =  0  by  the  symmetry  of  the  N( 0, 1)  density  around 
the  origin,  so  that 

cov(X,  Y)  =  E [XY]  -  E[AT]E[U]  =  0 

Clearly,  X  and  Y  are  not  independent,  since  knowing  the  value  of  X  determines  the  value  of  Y. 

As  we  discuss  in  the  next  section,  uncorrelated  jointly  Gaussian  random  variables  are  indeed 
independent.  The  joint  distribution  of  such  random  variables  is  determined  by  means  and  co- 
variances,  hence  we  also  postpone  more  detailed  discussion  of  covariance  computation  until  our 
study  of  joint  Gaussianity. 


5.6  Gaussian  Random  Variables 


We  begin  by  repeating  the  definition  of  a  Gaussian  random  variable. 

Gaussian  random  variable:  The  random  variable  X  is  said  to  follow  a  Gaussian,  or  normal 
distribution  if  its  density  is  of  the  form: 

P fa)  =  ~7=^ 

V  2ttvz 

where  m  =  E[A"]  is  the  mean  of  X ,  and  v2  =  var(X)  is  the  variance  of  X.  The  Gaussian  density 
is  therefore  completely  characterized  by  its  mean  and  variance. 

Notation  for  Gaussian  distribution:  We  use  N(m,v 2)  to  denote  a  Gaussian  distribution 
with  mean  m  and  variance  v2 ,  and  use  the  shorthand  X  ~  N(m,v 2)  to  denote  that  a  random 
variable  X  follows  this  distribution. 

We  have  already  noted  the  characteristic  bell  shape  of  the  Gaussian  PDF  in  the  example  plotted 
in  Figure  5.5:  the  bell  is  centered  around  the  mean,  and  its  width  is  determined  by  the  variance. 
We  now  develop  a  detailed  framework  for  efficient  computations  involving  Gaussian  random 
variables. 

Standard  Gaussian  random  variable:  A  zero  mean,  unit  variance  Gaussian  random  variable, 
X  ~  iV(0, 1),  is  termed  a  standard  Gaussian  random  variable. 

An  important  property  of  Gaussian  random  variables  is  that  they  remain  Gaussian  under  scaling 
and  translation.  Suppose  that  X  ~  N(m,v2).  Define  Y  =  aX  +  b ,  where  a,  b  are  constants 
(assume  a  ^  0  to  avoid  triviality).  The  density  of  Y  can  be  found  as  follows: 


exp 


[x 


m 


2v2 


OO  <  X  <  oo 


(5.49) 


p(y) 


p(x) 
I  — I 

'  dx  ' 


x={y-b)/a 


Noting  that  —  a,  and  plugging  in  (5.49),  we  obtain 

v (y)  =  R7=iexP  (-  (fa  -  6)/a  -  m)2  /(2y2)) 
=  72^7  exp  (-  fa  -  (am  +  b ^  /(2aV)) 


Comparing  with  (5.49),  we  can  see  that  Y  is  also  Gaussian,  with  mean  my  =  am+b  and  variance 
v'y  =  a2v2 .  This  is  important  enough  to  summarize  and  restate. 

Gaussianity  is  preserved  under  scaling  and  translation 

If  X  ~  N(m,  v 2),  then  Y  =  aX  +  b  ~  N(am  +  b,  a2v2). 
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As  a  consequence  of  the  preceding  result,  any  Gaussian  random  variable  can  be  scaled  and 
translated  to  obtain  a  “standard”  Gaussian  random  variable  with  zero  mean  and  unit  variance. 
For  X  ~  N(m,  v 2),  Y  =  aX  +  6  ~  iV(0, 1)  if  am  +  6  =  0  and  a2v2  =  1  to  have  a  =  v,  b  =  — nm. 
That  is,  Y  =  (A"  —  m)/v  ~  iV(0, 1). 

Standard  Gaussian  random  variable 

A  standard  Gaussian  random  variable  Ar(0, 1)  has  mean  zero  and  variance  one. 

Conversion  of  a  Gaussian  random  variable  into  standard  form 

If  A  ~  N(m,  v2),  then  ^  _  jy( 0, 1). 

As  the  following  example  illustrates,  this  enables  us  to  express  probabilities  involving  any  Gaus¬ 
sian  random  variable  as  probabilities  involving  a  standard  Gaussian  random  variable. 

Example  5.6.1  Suppose  that  X  ~  Ar(5,9).  Then  (A  —  5)/\/9  =  (X  —  5)/3  ~  1V(0, 1).  Any 
probability  involving  X  can  now  be  expressed  as  a  probability  involving  a  standard  Gaussian 
random  variable.  For  example, 

P[X  >  11]  =  P[(X  -  5)/3  >  (11  -  5)/3]  =  P[N( 0, 1)  >  2] 

We  therefore  set  aside  special  notation  for  the  cumulative  distribution  function  (CDF)  <h(x)  and 
complementary  cumulative  distribution  function  (CCDF)  Q(x)  of  a  standard  Gaussian  random 
variable.  By  virtue  of  the  standard  form  conversion,  we  can  now  express  probabilities  involving 
any  Gaussian  random  variable  in  terms  of  the  <L  or  Q  functions.  The  definitions  of  these  functions 
are  illustrated  in  Figure  5.12,  and  the  corresponding  formulas  are  specified  below. 


Figure  5.12:  The  $  and  Q  functions  are  obtained  by  integrating  the  N( 0, 1)  density  over  appro¬ 
priate  intervals. 


dt  (5.50) 

dt  (5.51) 

See  Figure  5.13  for  a  plot  of  these  functions.  By  definition,  <h(x)  +  <5(x)  =  1.  Furthermore,  by  the 
symmetry  of  the  Gaussian  density  around  zero,  Q(—x)  =  <E>(:r).  Combining  these  observations, 
we  note  that  Q(—x)  =  1  —  Q(x),  so  that  it  suffices  to  consider  only  positive  arguments  for  the 
Q  function  in  order  to  compute  probabilities  of  interest. 

Let  us  now  consider  a  few  more  Gaussian  probability  computations. 


H*)  =  P[N(0, 1  )<x\=  j  --L  exp  (-1^) 
Q(x)  =  P[N( 0, 1)  >  x]  =  j  --=  exp  h-4 
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0 


Figure  5.13:  The  $  and  Q  functions. 


Example  5.6.2  X  is  a  Gaussian  random  variable  with  mean  m  =  —5  and  variance  v2  =  4.  Find 
expressions  in  terms  of  the  Q  function  with  positive  arguments  for  the  following  probabilities: 
P[X  >  3],  P[X  <  -8],  P[X  <  -1],  P[3  <  X  <  6],  P[X2  -2X>  15]. 

Solution:  We  solve  this  problem  by  normalizing  X  to  a  standard  Gaussian  random  variable 

X-m  _  X+5 

P[X  >  3]  =  =  4]  =  Q( 4) 

P[X  <  -8]  =  =  -1.5]  =  4(-1.5)  =  Q(1.5) 

P{X  <  -1]  =  <  Xl± 1  =  2]  =  4(2)  =  1  -  Q(2) 

P[3  <  X  <  6]  =  P[4  =  3±5  <  *±s  <  6±5  =  5.5] 

=  $(5.5)  -  $(4)  =  ((1  -  Q(5.5))  -  (1  -  Q(4)))  =  Q(4)  -  Q(5.5) 

Computation  of  the  probability  that  X2  —  2X  >  15  requires  that  we  express  this  event  in  terms 
of  simpler  events  by  factorization: 

X2  -  2X  -  15  =  X2  -  5X  +  3X  -  15  =  (X  -  5)(X  +  3) 

This  shows  that  X2  —  2X  >  15,  or  X2  —  2X  —  15  >  0,  if  and  only  if  X  —  5  >  0  and  X  +  3  >  0, 
or  X  —  5  <  0  and  X  +  3  <  0.  The  first  event  simplifies  to  X  >5,  and  the  second  to  X  <  —3,  so 
that  the  desired  probability  is  a  union  of  two  mutually  exclusive  events.  We  therefore  have 

P[X2  -  2X  >  15]  =  P[X  >  5]  +  P[X  <  -3]  =  Q(5±5)  +  $(^±5) 

=  Q(5)  +  $(1)  =  Q(5)  +  1-Q(1) 


Interpreting  the  transformation  to  standard  Gaussian:  For  X  ~  N(m,v 2),  the  transfor¬ 
mation  to  standard  Gaussian  tells  us  that 


P[ X  >  m  +  av\ 


P 


X  —  m 

- >  a 

v 


Q(a) 


That  is,  the  tail  probability  of  a  Gaussian  random  probability  depends  only  on  the  number  of 
standard  deviations  a  away  from  the  mean.  More  generally,  the  transformation  is  equivalent  to 
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the  observation  that  the  probability  of  an  infinitesimal  interval  [x,  x  +  Ax]  depends  only  on  its 
normalized  distance  from  the  mean,  and  its  normalized  length  — : 


P[x  <  X  <  x  +  Ax]  w  p(x)  Ax 


Ax 

v 


Relating  the  Q  function  to  the  error  function:  Mathematical  software  packages  such  as 
Matlab  often  list  the  error  function  and  the  complementary  error  function,  defined  for  x  >  0  by 

erf(x)  =  ^  Jo  e~t2dt 

erfc(x)  =  1  —  erf(x)  =  J/°  e~f2  dt 

Recognizing  the  form  of  the  N(0,  |)  density,  given  by  A^e~t2 ,  we  see  that 

erf(x)  =  2P[0  <  X  <  x]  ,  erfc(x)  =  2P[X  >  x] 
where  X  ~  iV(0,  \).  Transforming  to  standard  Gaussian  as  usual,  we  see  that 


erfc(x)  =  2 P[X  >  x] 


X  - 

7m 


x  —  0 

7m 


We  can  invert  this  to  compute  the  Q  function  for  positive  arguments  in  terms  of  the  complemen¬ 
tary  error  function,  as  follows: 


Q(x)  =  T^erfc  ,  x  >  0  (5.52) 

For  x  <  0,  we  can  compute  Q(x)  =  1  —  Q(—x)  using  the  preceding  equation  to  evaluate  the 
right-hand  side.  While  the  Communications  System  Toolbox  in  Matlab  has  the  Q  function  built 
in  as  qfunc(-),  we  provide  a  Matlab  code  fragment  for  computing  the  Q  function  based  on  the 
complementary  error  function  (available  without  subscription  to  separate  toolboxes)  below. 

Code  Fragment  5.6.1  (Computing  the  Q  function) 

%Q  function  computed  using  erfc  (works  for  vector  inputs) 
function  z  =  qfunction(x) 
b=  (x>=0) ; 

yl=b.*x;  °/„select  the  positive  entries  of  x 

y2=(l-b) . *(-x) ;  /(select,  and  flip  the  sign  of,  negative  entries  in  x 
zl  =  (0 . 5*erf c (yl . /sqrt (2) ) ) . *b;  %Q(x)  for  positive  entries  in  x 

z2  =  (l-0.5*erfc(y2./sqrt(2))) . *(l-b) ;  %Q(x)  =  1  -  Q(-x)  for  negative  entries  in  x 
z=zl+z2;  %final  answer  (works  for  x  with  positive  or  negative  entries) 

Example  5.6.3  (Binary  on-off  keying  in  Gaussian  noise)  A  received  sample  Y  in  a  com¬ 
munication  system  is  modeled  as  follows:  Y  =  m  +  N  if  1  is  sent,  and  Y  =  N  if  0  is  sent,  where 
N  ~  Ar(0,n2)  is  the  contribution  of  the  receiver  noise  to  the  sample,  and  where  |m|  is  a  measure 
of  the  signal  strength.  Assuming  that  m  >  0,  suppose  that  we  use  the  simple  decision  rule  that 
splits  the  difference  between  the  average  values  of  the  observation  under  the  two  scenarios:  say 
that  1  is  sent  if  Y  >  m/2,  and  say  that  0  is  sent  if  Y  <  m/2.  Assuming  that  both  0  and  1 
are  equally  likely  to  be  sent,  the  signal  power  is  (l/2)m2  +  (l/2)02  =  m2 / 2.  The  noise  power  is 
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E[N2}  =  v2.  Thus,  SNR  =  g. 

(a)  What  is  the  conditional  probability  of  error,  conditioned  on  0  being  sent. 

(b)  What  is  the  conditional  probability  of  error,  conditioned  on  1  being  sent. 

(c)  What  is  the  (unconditional)  probability  of  error  if  0  and  1  are  equally  likely  to  have  been 
sent. 

(d)  What  is  the  error  probability  for  SNR  of  13  dB? 

Solution: 

(a)  Since  Y  ~  N( 0,w2)  given  that  0  is  sent,  the  conditional  probability  of  error  is  given  by 

f  771  /  2  —  0  \  /  777/  \ 

Pe |0  =  P[say  1|0  sent]  =  P[Y  >  m/2|0  sent]  =  Q  ( - j  =  Q  J 

(b)  Since  Y  ~  N(m,v2)  given  that  1  is  sent,  the  conditional  probability  of  error  is  given  by 

Pe |!  =  P[say  0|1  sent]  =  P[Y  <  m/2\l  sent]  =  <P  ~  =  Q  (£) 

(c)  If  %  is  the  probability  of  sending  0,  then  the  unconditional  error  probability  is  given  by 

Pe  =  TToPelO  +  (1  ~  VT0)Pe|l  =  Q  (^)  =  Q  (y/ SNR/2) 
regardless  of  7 r0  for  this  particular  decision  rule. 

(d)  For  SNR  of  13  dB,  we  have  SNR(raw )  =  ios'Arihdfe)/1()  =  101,3  ~  20,  so  that  the  error 
probability  evaluates  to  Pe  =  Q(\/l0)  =  7.8  x  10~4. 

Figure  5.14  shows  the  probability  of  error  on  a  log  scale,  plotted  against  the  SNR  in  dB.  This 


Figure  5.14:  Probability  of  error  versus  SNR  for  on-off-keying. 


is  the  first  example  of  the  many  error  probability  plots  that  we  will  see  in  this  chapter. 

A  Matlab  code  fragment  (cosmetic  touches  omitted)  for  generating  Figure  5.14  in  Example  5.6.3 
is  as  below. 

Code  Fragment  5.6.2  (Error  probability  computation  and  plotting) 
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‘/.Plot  of  error  probability  versus  SNR  for  on-off  keying 

snrdb  =  -5:0.1:15;  /(vector  of  SNRs  (in  dB)  for  which  to  evaluate  error  prob 
snr  =  10 .  ~  (snrdb/10) ;  °/„vector  of  raw  SNRs 

pe  =  qfunction(sqrt (snr/2) ) ;  "/vector  of  error  probabilities 

%plot  error  prob  on  log  scale  versus  SNR  in  dB 

semilogy (snrdb, pe) ; 

ylabel ( ’ Error  Probability 1 ) ; 

xlabeK’SNR  (dB)’); 


The  preceding  example  illustrates  a  more  general  observation  for  signaling  in  AWGN:  the  proba¬ 
bility  of  error  involves  terms  such  as  Q(\/cl  SNR),  where  the  scale  factor  a  depends  on  properties 
of  the  signal  constellation,  and  SNR  is  the  signal-to-noise  ratio.  It  is  therefore  of  interest  to  un¬ 
derstand  how  the  error  probability  decays  with  SNR.  As  shown  in  Appendix  5. A,  there  are  tight 
analytical  bounds  for  the  Q  function  which  can  be  used  to  deduce  that  it  decays  exponentially 
with  its  argument,  as  stated  in  the  following. 

Asymptotics  of  Q(x)  for  large  arguments:  For  large  x  >  0,  the  exponential  decay  of  the  Q 
function  dominates.  We  denote  this  by 


Q(x)  =  e -*a/2  , 

which  is  shorthand  for  the  following  limiting  result: 


lim  '°SQW  =  1 


x  — >  oo 


-x2/2 


(5.53) 


(5.54) 


These  asymptotics  play  a  key  role  in  design  of  communication  systems.  Since  events  that  cause 
bit  errors  have  probabilities  involving  terms  such  as  Q(Va  SNR )  =  e~a  SNR /2,  when  there  are 
multiple  events  that  can  cause  bit  errors,  the  ones  with  the  smallest  rates  of  decay  a  dominate 
performance.  We  can  therefore  focus  on  these  worst-case  events  in  our  designs  for  moderate  and 
high  SNR.  This  simplistic  view  does  not  quite  hold  in  heavily  coded  systems  operating  at  low 
SNR,  but  is  still  an  excellent  perspective  for  arriving  at  a  coarse  link  design. 


5.6.1  Joint  Gaussianity 

Often,  we  need  to  deal  with  multiple  Gaussian  random  variables  defined  on  the  same  probability 
space.  These  might  arise,  for  example,  when  we  sample  filtered  WGN.  In  many  situations  of 
interest,  not  only  are  such  random  variables  individually  Gaussian,  but  they  satisfy  a  stronger 
joint  Gaussianity  property.  Just  as  a  Gaussian  random  variable  is  characterized  by  its  mean  and 
variance,  jointly  Gaussian  random  variables  are  characterized  by  means  and  covariances.  We  are 
also  interested  in  what  happens  to  these  random  variables  under  linear  operations,  corresponding, 
for  example,  to  filtering.  Hence,  we  first  review  mean  and  covariance,  and  their  evolution  under 
linear  operations  and  translations,  for  arbitrary  random  variables  defined  on  the  same  probability 
space. 

Covariance:  The  covariance  of  random  variables  AJ  and  X2  measures  the  correlation  between 
how  they  vary  around  their  means,  and  is  given  by 

cav(XuX2)  =  E  [(AR  -  E[A1])(A2  -  E[X2])]  =  E[ARA:2]  -  E[AR]E[A2] 

The  second  formula  is  obtained  from  the  first  by  multiplying  out  and  simplifying: 

E  \(X1  -  E1A11)(A2  -  ELA21)1  =  E  \X±X2  -  E\X1]X2  +  EfAdEfXd  -  ARELAhl 
=  E[AW2]  -  e[ar]e[a:2]  +  e[ar]e[a2]  -  e[ar]e[a2]  =  e[ara2]  -  e[ar]e[a:2] 
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where  we  use  the  linearity  of  the  expectation  operator  to  pull  out  constants. 
Uncorrelatedness:  X1  and  X2  are  said  to  be  uncorrelated  if  cov(Ab,  X2)  =  0. 

Independent  random  variables  are  uncorrelated:  If  X\  and  X2  are  independent,  then 

cov(X1,X2)  =  E[X1X2]  -  E[Xi]E[X2]  =  E[Xx]E[X2]  -  E[Xx]E[X2]  =  0 

The  converse  is  not  true  in  general;  that  is,  uncorrelated  random  variables  need  not  be  inde¬ 
pendent.  However,  we  shall  see  that  jointly  Gaussian  uncorrelated  random  variables  are  indeed 
independent. 

Variance:  Note  that  the  variance  of  a  random  variable  is  its  covariance  with  itself: 

var(X)  =  cov(X,X)  =  E  [(X  -  E[X])2]  =  E[X2]  -  (E[X])2 

The  use  of  matrices  and  vectors  provides  a  compact  way  of  representing  and  manipulating  means 
and  covariances,  especially  using  software  programs  such  as  Matlab.  Thus,  for  random  variables 
Xi, ...,  Xm,  we  define  the  random  vector  X  =  (Xl,  ...,  Xm)T,  and  arrange  the  means  and  pairwise 
covariances  in  a  vector  and  matrix,  respectively,  as  follows. 

Mean  vector  and  covariance  matrix:  Consider  an  arbitrary  m-dimensional  random  vector 
X  =  (Ad, ...,  Xm)T .  The  m  x  1  mean  vector  of  X  is  defined  as  my  =  E[X]  =  (E[X4], ...,  E[Xm])r. 
The  m  x  m  covariance  matrix  Cx  has  (i,j) th  entry  given  by  the  covariance  between  the  zth  and 
jth  random  variables: 

C x(i,j)  =  cov(Xj,  Xj)  =  E  [(Ad  -  E [XiDiXj  -  E[X,-])]  =  E  [X^]  -  EfX^EfX,-] 

More  compactly, 


Cx  =  E[(X  -  E[X])(X  -  E[X])T]  =  E[XXt]  -  E[X](E[X])T 

Notes  on  covariance  computation:  Computations  of  variance  and  covariance  come  up  often 
when  we  deal  with  Gaussian  random  variables,  hence  it  is  useful  to  note  the  following  properties 
of  covariance. 

Property  1:  Covariance  is  unaffected  by  adding  constants. 

cov(X  +  a,  Y  +  b)  =  cov(AT,  Y)  for  any  constants  a,  b 

Covariance  provides  a  measure  of  the  correlation  between  random  variables  after  subtracting  out 
their  means,  hence  adding  constants  to  the  random  variables  (which  just  translates  their  means) 
does  not  affect  covariance. 

Property  2:  Covariance  is  a  bilinear  function  (i.e. ,  it  is  linear  in  both  its  arguments). 


cov(a1X1  +  a2X2,  a3X3  +  a4X4)  =  aia3cov(Ab,  X3)  +  a4a4cov(Ad,  A"4) 

+  a2a3cov(X2,  X3)  +  a2a4 cov(X2,  AT4) 


By  Property  1,  it  is  clear  that  we  can  always  consider  zero  mean,  or  centered,  versions  of  random 
variables  when  computing  the  covariance.  An  example  that  frequently  arises  in  performance 
analysis  of  communication  systems  is  a  random  variable  which  is  a  sum  of  a  deterministic  term 
(e.g.,  due  to  a  signal),  and  a  zero  mean  random  term  (e.g.  due  to  noise).  In  this  case,  dropping 
the  signal  term  is  often  convenient  when  computing  variance  or  covariance. 

Affine  transformations:  For  a  random  vector  X,  the  analogue  of  scaling  and  translating  a 
random  variable  is  a  linear  transformation  using  a  matrix,  together  with  a  translation.  Such  a 
transformation  is  called  an  affine  transformation.  That  is,  Y  =  AX+b  is  an  affine  transformation 
of  X,  where  A  is  a  deterministic  matrix  and  b  a  deterministic  vector. 
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Example  5.6.4  (Mean  and  variance  after  an  affine  transformation):  Let  Y  =  X 1  — 

2X2  +  4,  where  X\  has  mean  -1  and  variance  4,  X2  has  mean  2  and  variance  9,  and  the  covariance 
cov(X1,X2)  =  —3.  Find  the  mean  and  variance  of  Y. 

Solution:  The  mean  is  given  by 

E[Y]  =  E[Xi]  -  2E[X2]  +  4  =  -1  -  2(2)  +  4  =  -1 

The  variance  is  computed  as 

var(Y)  =  cov(Y,  Y)  =  cov(Xi  -  2X2  +  4,  X,  -  2X2  +  4) 

=  cov(Xi,  Xi)  -  2cov(Xi,  X2)  -  2cov(A"2,  Ad)  +  4cov(A"2,  A"2) 

where  the  constant  drops  out  because  of  Property  1.  We  therefore  obtain  that 

var(Y)  =  cov(X1,  Ad)  -  4cov(Xi,  X2)  +  4cov(X2,  X2)  =  4  -  4(-3)  +  4(9)  =  52 


Computations  such  as  those  in  the  preceding  example  can  be  compactly  represented  in  terms 
of  matrices  and  vectors,  which  is  particularly  useful  for  computations  for  random  vectors.  In 
general,  an  affine  transformation  maps  one  random  vector  into  another  (of  possibly  different 
dimension),  and  the  mean  vector  and  covariance  matrix  evolve  as  follows. 

Mean  and  covariance  evolution  under  affine  transformation 

If  X  has  mean  m  and  covariance  C,  and  Y  =  AX  +  b, 
then  Y  has  mean  my  =  Am  +  b  and  covariance  Cy  =  ACA7  . 

To  see  this,  first  compute  the  mean  vector  of  Y  using  the  linearity  of  the  expectation  operator: 

my  =  E[Y]  =  E[AX  +  b]  =  AE[X]  +  b  =  Am  +  b  (5.55) 

This  also  implies  that  the  “zero  mean”  version  of  Y  is  given  by 

Y  -  E[Y]  =  (AX  +  b)  -  (Amx  +  b)  =  A(X  -  mx) 

so  that  the  covariance  matrix  of  Y  is  given  by 

Cy  =  E[(Y  -  E[Y])(Y  -  E[Y])t]  =  E[A(X  -  m)(X  -  m)TAT]  =  ACAT  (5.56) 

Note  that  the  dimensions  of  X  and  Y  can  be  different:  X  can  be  m  x  1,  A  can  be  n  x  m,  and  Y, 
b  can  be  n  x  1,  where  m,  n  are  arbitrary.  We  also  note  below  that  mean  and  covariance  evolve 
separately  under  such  transformations. 

Mean  and  covariance  evolve  separately  under  affine  transformations:  The  mean  of  Y 
depends  only  on  the  mean  of  X,  and  the  covariance  of  Y  depends  only  on  the  covariance  of  X. 
Furthermore,  the  additive  constant  b  in  the  transformation  does  not  affect  the  covariance,  since 
it  influences  only  the  mean  of  Y. 

Example  5.6.4  redone:  We  can  check  that  we  get  the  same  result  as  before  by  setting 


mY  = 


Cx  = 


4 

-3 


A  =  (1  -2),  b  =  4 


(5.57) 


and  applying  (5.55)  and  (5.56). 

Jointly  Gaussian  random  variables,  or  Gaussian  random  vectors:  Random  variables 
Xi, . . . ,  Xm  defined  on  a  common  probability  space  are  said  to  be  jointly  Gaussian,  or  the  m  x  1 
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random  vector  X  =  {X\ ,  ...,  Xm)T  is  termed  a  Gaussian  random  vector,  if  any  linear  combination 
of  these  random  variables  is  a  Gaussian  random  variable.  That  is,  for  any  scalar  constants 
ai, am,  the  random  variable  a\Xi  +  ...  +  amXm  is  Gaussian. 

A  Gaussian  random  vector  is  completely  characterized  by  its  mean  vector  and  co- 
variance  matrix:  This  is  a  generalization  of  the  observation  that  a  Gaussian  random  variable  is 
completely  characterized  by  its  mean  and  variance.  We  derive  this  in  Problem  5.47,  but  provide 
an  intuitive  argument  here.  The  definition  of  joint  Gaussianity  only  requires  us  to  characterize 
the  distribution  of  an  arbitrarily  chosen  linear  combination  of  Ad, ...,  Xm.  For  a  Gaussian  random 
vector  X  =  (Ax,  ...,Xm)T,  consider  Y  =  axXi  +  ...  +  amXm,  where  cp,  ...,am  can  be  any  scalar 
constants.  By  definition,  A  is  a  Gaussian  random  variable,  and  is  completely  characterized  by 
its  mean  and  variance.  We  can  compute  these  in  terms  of  mj  and  Cx  using  (5.55)  and  (5.56) 
by  noting  that  Y  =  aTX,  where  a  =  (ai, ...,  am)T .  Thus, 

my  =  aTmx 

Gy  =  var(V')  =  aTC.\-a 

We  have  therefore  shown  that  we  can  characterize  the  mean  and  variance,  and  hence  the  density, 
of  an  arbitrarily  chosen  linear  combination  Y  if  and  only  if  we  know  the  mean  vector  mx  and 
covariance  matrix  Cx-  As  we  see  in  Problem  5.47,  this  is  the  basis  for  the  desired  result  that 
the  distribution  of  Gaussian  random  vector  X  is  completely  characterized  by  m^  and  Cx- 

Notation  for  joint  Gaussianity:  We  use  the  notation  X  ~  A(m,  C)  to  denote  a  Gaussian 
random  vector  X  with  mean  vector  m  and  covariance  matrix  C. 

The  preceding  definitions  and  observations  regarding  joint  Gaussianity  apply  even  when  the 
random  variables  involved  do  not  have  a  joint  density.  For  example,  it  is  easy  to  check  that, 
according  to  this  definition,  X\  and  X2  =  4Xi~l  are  jointly  Gaussian.  However,  the  joint  density 
of  Xi  and  X2  is  not  well-defined  (unless  we  allow  delta  functions),  since  all  of  the  probability 
mass  in  the  two-dimensional  (xi,x2)  plane  is  collapsed  onto  the  line  x2  =  4x\  —  1.  Of  course, 
since  X2  is  completely  determined  by  A1;  any  probability  involving  Xi,X2  can  be  expressed  in 
terms  of  X\  alone.  In  general,  when  the  m-dimensional  joint  density  does  not  exist,  probabilities 
involving  Xy  . . . ,  Xm  can  be  expressed  in  terms  of  a  smaller  number  of  random  variables,  and 
can  be  evaluated  using  a  joint  density  over  a  lower- dimensional  space.  A  necessary  and  sufficient 
condition  for  the  joint  density  to  exist  is  that  the  covariance  matrix  is  invertible. 

Joint  Gaussian  density  exists  if  and  only  if  the  covariance  matrix  is  invertible:  We 

do  not  prove  this  result,  but  discuss  it  in  the  context  of  the  two-dimensional  density  in  Example 

5.6.5. 

Joint  Gaussian  density:  For  X  =  (Xl,  ...,  Xm)  ~  N( m,  C),  if  C  is  invertible,  the  joint  density 
exists  and  takes  the  following  form  (we  skip  the  derivation,  but  see  Problem  5.47): 

p(x i,-,Xm)  =p(x)  =  -^====  exp  ^-^(x-m)TC~1(x-m)^  (5.58) 

where  I  Cl  denotes  the  determinant  of  C. 


Example  5.6.5  (Two-dimensional  joint  Gaussian  density)  In  order  to  visualize  the  joint 
Gaussian  density  (this  is  not  needed  for  the  remainder  of  the  development,  hence  this  example 
can  be  skipped),  let  us  consider  two  jointly  Gaussian  random  variables  X  and  Y.  In  this  case,  it 
is  convenient  to  define  the  normalized  correlation  between  X  and  Y  as 


p(X,Y)  =  COV(X,F)  = 
^var(A)var(T) 


(5.59) 
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(a)  Joint  Gaussian  Density  (b)  Contours  of  density 

Figure  5.15:  Joint  Gaussian  density  and  its  contours  for  o\  =  1,  ay  =  4  and  p  =  —0.5. 

Thus,  cov(X,  Y)  =  pax&Y,  where  var(A")  =  cr^,  var(F)  =  cry,  and  the  covariance  matrix  for  the 
random  vector  (X,  Y')T  is  given  by 


{  °2X  P°X<?Y  \ 

V  P°xCTy  Oy  ) 


(5.60) 


It  is  shown  in  Problem  5.46  that  \p\  <  1.  For  \p\  =  1,  it  is  easy  to  check  that  the  covariance  matrix 
has  determinant  zero,  hence  the  joint  density  formula  (5.58)  cannot  be  applied.  As  shown  in 
Problem  5.46,  this  has  a  simple  geometric  interpretation:  |p|  =  1  corresponds  to  a  situation  when 
X  and  Y  are  affine  functions  of  each  other,  so  that  all  of  the  probability  mass  is  concentrated 
on  a  line,  hence  a  two-dimensional  density  does  not  exist.  Thus,  we  need  the  strict  inequality 
\p\  <  1  for  the  covariance  matrix  to  be  invertible.  Assuming  that  \p\  <  1,  we  plug  (5.60)  into 
(5.58),  setting  the  mean  vector  to  zero  without  loss  of  generality  (a  nonzero  mean  vector  simply 
shifts  the  density).  We  get  the  joint  density  shown  in  Figure  5.15  for  a\  =  1,  a'y  =  4  and 
p  =  —0.5.  Since  Y  has  larger  variance,  the  density  decays  more  slowly  in  Y  than  in  X.  The 
negative  normalized  correlation  leads  to  contour  plots  given  by  tilted  ellipses,  corresponding  to 
setting  quadratic  function  x7C_1x  in  the  exponent  of  the  density  to  different  constants. 
Exercise:  Show  that  the  ellipses  shown  in  Figure  5.15(b)  can  be  described  as 

x 2  +  ay 2  +  bxy  =  c 


specifying  the  values  of  a  and  b. 


While  we  hardly  ever  integrate  the  joint  Gaussian  density  to  compute  probabilities,  we  use  its 
form  to  derive  many  important  results.  One  such  result  is  stated  below. 

Uncorrelated  jointly  Gaussian  random  variables  are  independent:  This  follows  from 
the  form  of  the  joint  Gaussian  density  (5.58).  If  X\,  ...,Xm  are  pairwise  uncorrelated,  then  the 
off-diagonal  entries  of  the  covariance  matrix  C  are  zero:  C (i,j)  =  0  for  i  4  j.  Thus,  C  and 
C_1  are  both  diagonal  matrices,  with  diagonal  entries  given  by  C (i,i)  =  vf,  C-1(i,i)  =  -4, 

i 

i  =  1  and  determinant  |C|  =  vf.-.v^.  In  this  case,  we  see  that  the  joint  density  (5.58) 

decomposes  into  a  product  of  marginal  densities: 


1 

V2nvi 


(x1-m1) 
2v \ 


(xi—mm) 


2v 


X~ 

m 


2 


p(x1)...p(xm) 
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so  that  Xi, . . . ,  Xm  are  independent. 

Recall  that,  while  independent  random  variables  are  nncorrelated,  the  converse  need  not  be  true. 
However,  when  we  put  the  additional  restriction  of  joint  Gaussianity,  uncorrelatedness  does  imply 
independence. 

We  can  now  characterize  the  distribution  of  affine  transformations  of  jointly  Gaussian  random 
variables.  If  X  is  a  Gaussian  random  vector,  then  Y  =  AX  +  b  is  also  Gaussian.  To  see  this, 
note  that  any  linear  combination  of  hj, ...,  Yn  equals  a  linear  combination  of  Xl5  ...,Xm  (plus  a 
constant),  which  is  a  Gaussian  random  variable  by  the  Gaussianity  of  X.  Since  Y  is  Gaussian, 
its  distribution  is  completely  characterized  by  its  mean  vector  and  covariance  matrix,  which  we 
have  just  computed.  We  can  now  state  the  following  result. 

Joint  Gaussianity  is  preserved  under  affine  transformations 

If  X  ~  N(  m,  C),  then  AX  +  b~  JV(Am  +  b,  ACAT)  (5.61) 


Example  5.6.6  (Computations  with  jointly  Gaussian  random  variables)  As  in  Example 
5.6.4,  consider  two  random  variables  X\  and  X2  such  that  X\  has  mean  -1  and  variance  4,  X2 
has  mean  2  and  variance  9,  and  cov(X1;  X2)  =  —3.  Now  assume  in  addition  that  these  random 
variables  are  jointly  Gaussian. 

(a)  Write  down  the  mean  vector  and  covariance  matrix  for  the  random  vector  Y  =  (hj ,  Y2)T  1 
where  Y\  =  ?>Xi  —  2A2  +  3  and  Y2  =  Xi  +  X2  —  2. 

(b)  Evaluate  the  probability  P[  3Xi— 2X2  <  5]  in  terms  of  the  Q  function  with  positive  arguments. 

(c)  Suppose  that  Z  =  aXi  +  X2.  Find  the  constant  a  such  that  Z  is  independent  of  X1  +  X2. 
Solution  to  (a):  We  have  already  found  the  mean  and  covariance  of  X  in  Example  5.6.4;  they 
are  given  by  (5.57).  Now,  Y  =  AX  +  b,  where 


We  can  now  apply  (5.61)  to  obtain  the  mean  vector  and  covariance  matrix  for  Y: 

-4 


my  =  Amx  +  b  = 


-1 


Cy  =  ACxA  = 

Solution  to  (b):  Since  Y\  =  3Xi  —  2A"2  +  3  ~  N{— 4, 108),  the  required  probability  can  be  written 


108  -9 
-9  7 


as 


-P[3Xl  -  2X2  <  5]  =  P[Y 1  <  8]  =  $  LS^J  =  4>  (2/v^)  =  1  -Q  (2/^3 

Solution  to  (c):  Since  Z  =  aXi  +  X2  and  Xi  are  jointly  Gaussian,  they  are  independent  if  they 
are  nncorrelated.  The  covariance  is  given  by 

cov(Z, Xi)  =  cov(aXi  +  X2, Xi)  =  a  cav(Xi,Xi)  +  cov(X2, Xi)  =  4a  —  3 

so  that  we  need  a  =  3/4  for  Z  and  X,  to  be  independent. 


Discrete  time  WGN:  The  noise  model  N  ~  X(0,cr2I)  is  called  discrete  time  white  Gaussian 
noise  (WGN).  The  term  white  refers  to  the  noise  samples  being  nncorrelated  and  having  equal 
variance.  We  will  see  how  such  discrete  time  WGN  arises  from  continuous-time  WGN,  which  we 
discuss  during  our  coverage  of  random  processes  later  in  this  chapter. 
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Example  5.6.7  (Binary  on-off  keying  in  discrete  time  WGN)  Let  us  now  revisit  on-off 
keying,  explored  for  scalar  observations  in  Example  5.6.3,  for  vector  observations.  The  receiver 
processes  a  vector  Y  =  (Yi, ...,  Yn)T  of  samples  modeled  as  follows:  Y  =  s  +  N  if  1  is  sent,  and 
Y  =  N  is  0  is  sent,  ^vhere  s  —  •••?  is  the  siguctl,  ciud  the  noise  1^"  —  r^j 

-/V(0,a2I).  That  is,  the  noise  samples  TVi, ...,  are  i.i.d.  7V(0,cr2)  random  variables.  Suppose 
we  use  the  following  correlator-based  decision  statistic: 

n 

Z  =  sTY  =  J2skYk 

k= 1 

Thus,  we  have  reduced  the  vector  observation  to  a  single  number  based  on  which  we  will  make 
our  decision.  The  hypothesis  framework  developed  in  Chapter  6  will  be  used  to  show  that  this 
decision  statistic  is  optimal,  in  a  well-defined  sense.  For  now,  we  simply  accept  it  as  given. 

(a)  Find  the  conditional  distribution  of  Z  given  that  0  is  sent. 

(b)  Find  the  conditional  distribution  of  Z  given  that  1  is  sent. 

(c)  Observe  from  (a)  and  (b)  that  we  are  now  back  to  the  setting  of  Example  5.6.3,  with  Z  now 
playing  the  role  of  Y.  Specify  the  values  of  m  and  v 2,  and  the  SNR  =  ^ ,  in  terms  of  s  and  a2. 

(d)  As  in  Example  5.6.3,  consider  the  simple  decision  rule  that  1  is  sent  if  Z  >  m/2,  and  say 
that  0  is  sent  if  Z  <  m/2.  Find  the  error  probability  (in  terms  of  the  Q  function)  as  a  function 
of  s  and  a2. 

(e)  Evaluate  the  error  probability  for  s  =  (—2,  2, 1)T  and  a2  =  1/4. 

Solution: 

(a)  If  0  is  sent,  then  Y  =  N  =~  lV(0,cr2I).  Applying  (5.61)  with  m  =  0,  A  =  sT,  C  =  a2I.  we 
obtain  Z  =  s2Y  ~  A(0,  a2\  |s|  | 2). 

(b)  If  1  is  sent,  then  Y  =  s  +  N  ~  A(s,cr2I).  Applying  (5.61)  with  m  =  s,  A  =  sT,  C  =  cr2I. 
we  obtain  Z  =  srY  ~  N(\ |s| |2,  cr2| |s| |2).  Alternatively,  sJY  =  s1  (s  +  N)  =  ||s||2  +  s1  N.  Since 
sTN  JV(0,  a2\ | s [  |2)  from  (a),  we  simply  translate  the  mean  by  ||s||2. 

(c)  Comparing  with  Example  5.6.3,  we  see  that  m  =  ||s||2,  v2  =  <j2||s||2,  and  SNR  — 

(d)  From  Example  5.6.3,  we  know  that  the  decision  rule  that  splits  the  difference  between  the 
means  has  error  probability 


Pe  —  Pe  |0  —  Pe  |1 


=  Q 


plugging  in  the  expressions  for  m  and  v2  from  (c).  (e)  We  have  ||s||2  =  9.  Using  (d),  we  obtain 
Pe  =  Q(3)  =  0.0013. 

Noise  is  termed  colored  when  it  is  not  white;  that  is,  when  the  noise  samples  are  correlated  and/or 
have  different  variances.  We  will  see  later  how  colored  noise  arises  from  linear  transformations 
on  white  noise.  Let  us  continue  our  sequence  of  examples  regarding  on-off  keying,  but  now  with 
colored  noise. 

Example  5.6.8  (Binary  on-off  keying  in  discrete  time  colored  Gaussian  noise)  As  in 

the  previous  example,  we  have  a  vector  observation  Y  =  (Y1;  ...,Yn)T,  with  Y  =  s  +  N  if  1  is 
sent,  and  Y  =  N  is  0  is  sent,  where  s  =  (si,  ...,sn)T  is  the  signal.  However,  we  now  allow  the 
noise  covariance  matrix  to  be  arbitrary:  N  =  (Ni, ...,  Nn)  N{0,  CN). 

(a)  Consider  the  decision  statistic  Z\  =  s7Y.  Find  the  conditional  distributions  of  Z\  given  0 
sent,  and  given  1  sent. 

(b)  Show  that  Z\  follows  the  scalar  on-off  keying  model  Example  5.6.3,  specifying  the  parameters 

2 

mi  and  u2,  and  SNR\  =  in  terms  of  s  and  Cat. 

(c)  Find  the  error  probability  of  the  simple  decision  rule  comparing  Z\  to  the  threshold  mi/2. 
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(d)  Repeat  (a)-(c)  for  an  decision  statistic  Z2  =  siC^1Y  (use  the  notation  m2,  v\  and  SNR2  to 
denote  the  quantities  analogous  to  those  in  (b)). 

(e)  Apply  the  preceding  to  the  following  example:  two-dimensional  observation  Y  =  (Y1;Y2) 
with  s  =  (4,  —  2)r  and 


Find  explicit  expressions  for  Z\  and  Z2  in  terms  of  Y\  and  Y2.  Compute  and  compare  the  SNRs 
and  error  probabilities  obtained  with  the  two  decision  statistics. 

Solution:  We  proceed  similarly  to  Example  5.6.7. 

(a)  If  0  is  sent,  then  Y  =  N  =~  A(0,  Cat).  Applying  (5.61)  with  m  =  0,  A  =  sT,  C  =  Cat,  we 
obtain  Z\  =  srY  ~  A(0,stCats). 

If  1  is  sent,  then  Y  =  s  +  N  N(s,Cn).  Applying  (5.61)  with  m  =  s,  A  =  sT,  C  =  CN,  we 
obtain  Z\  =  s3Y  ~  A(||s||2,  stCats).  Alternatively,  s7Y  =  sr(s  +  N)  =  ||s||2  +  sTN.  Since 
sTN  rs./  A(0,stCats)  from  (a),  we  simply  translate  the  mean  by  ||s||2. 

2 

(b)  Comparing  with  Example  5.6.3,  we  see  that  mi  =  ||s||2,  v\  =  s7Cats,  and  SNRi  —  — 

Mi 

2stCns  ' 

(c)  From  Example  5.6.3,  we  know  that  the  decision  rule  that  splits  the  difference  between  the 
means  has  error  probability 


Pel  =  Q 


mi\ 

2  vj 


Q 


(- 


\2y/sTCNs  J 


plugging  in  the  expressions  for  and  v\  from  (b). 

(d)  We  now  have  Z2  =  sTC^1Y.  If  0  is  sent,  Y  =  N  =~  A(0,  Cat).  Applying  (5.61)  with  m  =  0, 
A  =  s^^C^1,  C  =  Cat,  we  obtain  Z2  =  srY  ~  N( 0,  srC^1s). 

If  1  is  sent,  then  Y  =  s  +  N  ~  A(s,  Cat).  Applying  (5.61)  with  m  =  s,  A  =  sTC)v1 ,  C  =  Cat, 
we  obtain  Z2  =  sTC^-1Y  ~  A(sTC^1s,  srC^1s).  That  is,  m2  =  sTC^1s,  v\  =  sTC^1s,  and 

SNR2  —  —  C2n  S  •  The  corresponding  error  probability  is  Pe2  =  Q  =  Q  ^  2Cjv  - 

(e)  For  the  given  example,  we  find  Z\  =  sTY  =  4Yi  —  2Y2  and  Z2  =  sTC)(r1  Y  =  |(7Yi  +  Y2). 
We  can  see  that  the  relative  weights  of  the  two  observations  are  quite  different  in  the  two  cases. 
Numerical  computations  using  the  Matlab  script  below  yield  SNRs  of  6.2  dB  and  9.4  dB,  and 
error  probabilities  of  0.07  and  0.02  in  the  two  cases,  so  that  Z2  provides  better  performance  than 
Z\.  It  can  be  shown,  using  the  methods  of  Chapter  6,  that  Z2  is  actually  the  optimal  decision 
statistic,  both  in  terms  of  maximizing  SNR  and  minimizing  error  probability. 


A  Matlab  code  fragment  for  generating  the  numerical  results  in  Example  5.6.8(e)  is  given  below. 

Code  Fragment  5.6.3  (Performance  of  on-off  keying  in  colored  Gaussian  noise) 

°/o7oOOK  with  colored  noise:  N(s,C_N)  versus  N(0,C_N) 
s=  [4 ; -2] ;  % signal 

Cn=[l  — 1 ;  — 1  4];  °/0noise  covariance  matrix 
°/07,decision  statistic  Z1  =  s~T  Y 
ml=  s’*s;  ’/.mean  if  1  sent 

variancel  =s’*Cn*s;  ’/.variance  under  each  hypothesis 
vl=sqrt  (variancel)  ;  ’/.standard  deviation 
SNRI  =  ml~2/(2*variancel) ;  ’/.SNR 

Pel  =  qf  unction  (ml/ (2*vl) )  ;  ’/.error  prob  for  "split  the  difference"  rule  using  Z1 
’/.’/.decision  statistic  Z2  =  s~T  Cn“{-13-  Y 
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m2  =  s  ’  *inv(Cn)  *s ;  '/.mean  if  1  sent 

variance2=s ’ *inv(Cn) *s ;  %variance=mean  in  this  case 

v2=sqrt (variance2) ;  %standard  deviation 

SNR2  =  m2~2/  (2*variance2)  ;  7„reduces  to  SNR2=  m2/2  in  this  case 

Pe2  =  qfunction(m2/(2*v2) ) ;  %error  prob  for  "split  the  difference"  rule  using  Z2 
°/„Compare  performance  of  the  two  rules 
10*logl0 ( [SNR1  SNR2] )  %SNRs  in  dB 
[Pel  Pe2]  ’/.error  probabilities 


5.7  Random  Processes 

A  key  limitation  on  the  performance  of  communication  systems  comes  from  receiver  noise,  which 
is  an  unavoidable  physical  phenomenon  (see  Appendix  5.C).  Noise  cannot  be  modeled  as  a 
deterministic  waveform  (i.e.,  we  do  not  know  what  noise  waveform  we  will  observe  at  any  given 
point  of  time).  Indeed,  neither  can  the  desired  signals  in  a  communication  system,  even  though 
we  have  sometimes  pretended  otherwise  in  prior  chapters.  Information-bearing  signals  such 
as  speech,  audio,  video  are  best  modeled  as  being  randomly  chosen  from  a  vast  ensemble  of 
possibilities.  Similarly,  the  bit  stream  being  transmitted  in  a  digital  communication  system  can 
be  arbitrary,  and  can  therefore  be  thought  of  as  being  randomly  chosen  from  a  large  number  of 
possible  bit  streams.  It  is  time,  therefore,  to  learn  how  to  deal  with  random  processes,  which  is 
the  technical  term  we  use  for  signals  that  are  chosen  randomly  from  an  ensemble,  or  collection,  of 
possible  signals.  A  detailed  investigation  of  random  processes  is  well  beyond  our  scope,  and  our 
goal  here  is  limited  to  developing  a  working  understanding  of  concepts  critical  to  our  study  of 
communication  systems.  We  shall  see  that  this  goal  can  be  achieved  using  elementary  extensions 
of  the  probability  concepts  covered  earlier  in  this  chapter. 


5.7.1  Running  example:  sinusoid  with  random  amplitude  and  phase 

Let  us  work  through  a  simple  example  before  we  embark  on  a  systematic  development.  Suppose 
that  X\  and  X2  are  i.i.d.  Ar(0, 1)  random  variables,  and  define 

X (£)  =  Xi  cos  27 xfct  —  X2  sin  2tt  fct  (5.62) 

where  fc>  0  is  a  fixed  frequency.  The  waveform  X(t)  is  not  a  deterministic  signal,  since 
Xi  and  X2  can  take  random  values  on  the  real  line.  Indeed,  for  each  time  t,  X(t)  is  a  random 
variable,  since  it  is  a  linear  combination  of  two  random  variables  X1  and  X2  defined  on  a  common 
probability  space.  Moreover,  if  we  pick  a  number  of  times  t\,t2,...,  then  the  corresponding 
samples  X(ti),X(t2), ...  are  random  variables  on  a  common  probability  space. 

Another  interpretation  of  X(t)  is  obtained  by  converting  {X\,  X2)  to  polar  form: 

X\  =  A  cos  0  ,  X2  =  A  sin  0 

For  X\ ,  X2  i.i.d.  iV(0, 1),  we  know  from  Problem  5.21  that  A  is  Rayleigh,  0  is  uniform  over 
[0,27t],  and  A,  0  are  independent.  The  random  process  X(t)  can  be  rewritten  as 

X(t)  =  A  cos  0  cos  27 rfct  —  A  sin  0  sin  27 rfct  =  A  cos(27 rfct  +  0)  (5.63) 

Thus,  X(t)  is  a  sinusoid  with  random  amplitude  and  phase. 

For  a  given  time  t,  what  is  the  distribution  of  X(t)7  Since  X(t)  is  a  linear  combination  of  i.i.d. 
Gaussian,  hence  jointly  Gaussian,  random  variables  X\  and  X2,  we  infer  that  it  is  a  Gaussian 
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random  variable.  Its  distribution  is  therefore  specified  by  computing  its  mean  and  variance,  as 
follows: 

E  [X(f)]  =  E[Xx]  cos27t fct  —  E[A"2]  sin  2nfrt  =  0  (5.64) 

var  (X  ( t ))  =  cov  (Ad  cos  2nfct  —  X2  sin  2nfct,  Xi  cos  2nfct  —  X2  sin  2irfct) 

=  cov(A!,  Xx)  cos2  27t fct  +  cov(A"2,  A"2)  sin2  27t fct  —  2cov(A1;  A2)  cos2tt fct sin 27t fct  (5.65) 
=  cos2  27T  fct  +  sin2  2nfct  =  1 

using  cov(Aj,  Aj)  =  var(Xj)  =  1,  i  =  1,2,  and  cov(Xx,X2)  =  0  (since  Ad,  X2  are  independent). 
Thus,  we  have  X(f)  ~  X(0, 1)  for  any  t. 

In  this  particular  example,  we  can  also  easily  specify  the  joint  distribution  of  any  set  of  n  samples, 
X(d), ...,  A (tn),  where  n  can  be  arbitrarily  chosen.  The  samples  are  jointly  Gaussian,  since  they 
are  linear  combinations  of  the  jointly  Gaussian  random  variables  Xi,  X2.  Thus,  we  only  need  to 
specify  their  means  and  pairwise  covariances.  We  have  just  shown  that  the  means  are  zero,  and 
that  the  diagonal  entries  of  the  covariance  matrix  are  one.  More  generally,  the  covariance  of  any 
two  samples  can  be  computed  as  follows: 

cov  (X’(tj),  X(tj))  =  cov  (Xi  cos2irfcti  —  X2  sin  2nfcti,  Xi  cos27r fctj  —  X2  sin27r/cd) 

=  cov(Ad,  Ad)  cos  2nfcti  cos  2nfctj  +  cov(X2,  X2)  sin  2nfcti  sin  2n fctj 

—  2cov(X1;  X2)  cos  2nfcti  sin  2tt  fctj  (5.66) 

=  cos  2nfcti  cos  2nfctj  +  sin  2nfcti  sin  27T  fctj 
=  cos  2ixfc{ti  -  tj) 


While  we  have  so  far  discussed  the  random  process  X(f)  from  a  statistical  point  of  view,  for 
fixed  values  of  X\  and  X2,  we  see  that  X(f)  is  actually  a  deterministic  signal.  Specifically,  if  the 
random  vector  (X1; X2)  is  defined  over  a  probability  space  G,  a  particular  outcome  well  maps 
to  a  particular  realization  (Ad(ca),  X2(ca)).  This  in  turn  maps  to  a  deterministic  “realization,”  or 
“sample  path,”  of  X(t),  which  we  denote  as  X(t,cu): 

X(t,uj )  =  Xi(o;)  cos27r fct  —  X2(lu)  sm2n fct 

To  see  what  these  sample  paths  look  like,  it  is  easiest  to  refer  to  the  polar  form  (5.63): 

X(t,u )  =  A(oj)  cos  {2nfct  +  ©(a;)) 

Thus,  as  shown  in  Figure  5.16,  different  sample  paths  have  different  amplitudes,  drawn  from  a 
Rayleigh  distribution,  along  with  phase  shifts  drawn  from  a  uniform  distribution. 


5.7.2  Basic  definitions 

As  we  have  seen  earlier,  a  random  vector  X  =  (X1,...,Xn)T  is  a  finite  collection  of  random 
variables  defined  on  a  common  probability  space,  as  depicted  in  Figure  5.7.  A  random  process 
is  simply  a  generalization  of  this  concept,  where  the  number  of  such  random  variables  can  be 
infinite. 

Random  process:  A  random  process  X  is  a  collection  of  random  variables  {X(t),t  e  T},  where 
the  index  set  T  can  be  finite,  countably  infinite,  or  uncountably  infinite.  When  we  interpret  the 
index  set  as  denoting  time,  as  we  often  do  for  the  scenarios  of  interest  to  us,  a  countable  index 
set  corresponds  to  a  discrete  time  random  process,  and  an  uncountable  index  set  corresponds  to 
a  continuous  time  random  process.  We  denote  by  X(i,  u)  the  value  taken  by  the  random  variable 
X[t)  for  any  given  outcome  uj  in  the  sample  space. 
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t 

Figure  5.16:  Two  sample  paths  for  a  sinusoid  with  random  amplitude  and  phase. 


For  the  sinusoid  with  random  amplitude  and  phase,  the  sample  space  only  needs  to  be  rich 
enough  to  support  the  two  random  variables  Xl  and  X2  (or  A  and  0),  from  which  we  can  create 
a  continuum  of  random  variables  X(t,oj),  —  oo  <  t  <  oo: 

uj  — >  (Xi(u) ,  X2(uj))  — >  X(t,u> ) 

In  general,  however,  the  source  of  randomness  can  be  much  richer.  Noise  in  a  receiver  circuit  is 
caused  by  random  motion  of  a  large  number  of  charge  carriers.  A  digitally  modulated  waveform 
depends  on  a  sequence  of  randomly  chosen  bits.  The  preceding  conceptual  framework  is  general 
enough  to  cover  all  such  scenarios. 

Sample  paths:  We  can  also  interpret  a  random  process  as  a  signal  drawn  at  random  from  an 
ensemble,  or  collection,  of  possible  signals.  The  signal  we  get  at  a  particular  random  draw  is 
called  a  sample  path,  or  realization,  of  the  random  process.  Once  we  fix  a  sample  path,  it  can  be 
treated  like  a  deterministic  signal.  Specifically,  for  each  fixed  outcome  uj  e  hi,  the  sample  path 
is  X(t,oj),  which  varies  only  with  t.  We  have  already  seen  examples  of  samples  paths  for  our 
running  example  in  Figure  5.16. 

Finite-dimensional  distributions:  As  indicated  in  Figure  5.17,  the  samples  X '(ti),  ...,X(tn) 
from  a  random  process  X  are  mappings  from  a  common  sample  space  to  the  real  line,  with 
X  ( ti ,  uj)  denoting  the  value  of  the  random  variable  X  (tf  for  outcome  uj  G  Q.  The  joint  distribution 
of  these  random  variables  depends  on  the  underlying  probability  measure  on  the  sample  space 
fh  We  say  that  we  “know”  the  statistics  of  a  random  process  if  we  know  the  joint  statistics  of 
an  arbitrarily  chosen  finite  collection  of  samples.  That  is,  we  know  the  joint  distribution  of  the 
samples  X(ti),  ...,X(tn),  regardless  of  the  number  of  samples  n,  and  the  sampling  times  ti, ...,  tn. 
These  joint  distributions  are  called  the  finite- dimensional  distributions  of  the  random  process, 
with  the  joint  distribution  of  n  samples  called  an  nth  order  distribution.  Thus,  while  a  random 
process  may  be  comprised  of  infinitely  many  random  variables,  when  we  specify  its  statistics,  we 
focus  on  a  finite  subset  of  these  random  variables. 

For  our  running  example  (5.62),  we  observed  that  the  samples  are  jointly  Gaussian,  and  specified 
the  joint  distribution  by  computing  the  means  and  covariances.  This  is  a  special  case  of  a  broader 
class  of  Gaussian  random  processes  (to  be  defined  shortly)  for  which  it  is  possible  to  characterize 
finite- dimensional  distributions  compactly  in  this  fashion.  Often,  however,  it  is  not  possible  to 
explicitly  specify  such  distributions,  but  we  can  still  compute  useful  quantities  averaged  across 
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X(t2)  \ 
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Sample  space  Q.  \  X  (t2  ,  W) 

X(tn) 


x  (tn ,  (0) 

Figure  5.17:  Samples  of  a  random  process  are  random  variables  defined  on  a  common  probability 
space. 


sample  paths. 

Ensemble  averages:  Knowing  the  finite- dimensional  distributions  enables  us  to  compute  sta¬ 
tistical  averages  across  the  collection,  or  ensemble,  of  sample  paths.  Such  averages  are  called 
ensemble  averages.  We  will  be  mainly  interested  in  “second  order”  statistics  (involving  expecta¬ 
tions  of  products  of  at  most  two  random  variables),  such  as  means  and  covariances.  We  define 
these  quantities  in  sufficient  generality  that  they  apply  to  complex-valued  random  processes,  but 
specialize  to  real-valued  random  processes  in  most  of  our  computations. 


5.7.3  Second  order  statistics 

Mean,  autocorrelation,  and  autocovariance  functions  (ensemble  averages):  For  a  ran¬ 
dom  process  X(t),  the  mean  function  is  defined  as 

mx(t)  =E[X(t)]  (5.67) 


and  the  autocorrelation  function  as 

Rx(ti,t2)=nx(t1)X*(t2)]  (5.68) 

Note  that  Rx(t ,  t)  =  E[|W(f)|2]  is  the  instantaneous  power  at  time  t.  The  autocovariance  function 
of  X  is  the  autocorrelation  function  of  the  zero  mean  version  of  X ,  and  is  given  by 

Cx{tiM)  =  mx{h)  -  E[X(ti)])(X(t2)  -  E[X(f2)])*]  =  Rx(t  i,t2)  -  mx(ti)m*x{t2)  (5.69) 

Second  order  statistics  for  running  example:  We  have  from  (5.64)  and  (5.66)  that 

mx(t)  =  0,  Cx(ti,t2)  =  RX(t1,t2)  =  cos27r/c(t!  -t2)  (5.70) 

It  is  interesting  to  note  that  the  mean  function  does  not  depend  on  t,  and  that  the  autocorrelation 
and  autocovariance  functions  depend  only  on  the  difference  of  the  times  t±  —  This  implies 
that  if  we  shift  X  (t)  by  some  time  delay  d,  the  shifted  process  X(t)  —  X(t  —  d )  would  have  the 
same  mean  and  autocorrelation  functions.  Such  translation  invariance  of  statistics  is  interesting 
and  important  enough  to  merit  a  formal  definition,  which  we  provide  next. 
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5.7.4  Wide  Sense  Stationarity  and  Stationarity 

Wide  sense  stationary  (WSS)  random  process:  A  random  process  X  is  said  to  be  WSS  if 

mx(t )  =  mx( 0)  for  all  t 

and 

Rx{h,  h)  =  Rx(ti  ~  t2,  0)  for  all  ti,t2 

In  this  case,  we  change  notation,  dropping  the  time  dependence  in  the  notation  for  the  mean 
mx,  and  expressing  the  autocorrelation  function  as  a  function  of  r  =  t\  —  t2  alone.  Thus,  for  a 
WSS  process,  we  can  define  the  autocorrelation  function  as 

Rx(t)  =  E[X(t)X*(t  -  t)]  for  X  WSS  (5.71) 

with  the  understanding  that  the  expectation  is  independent  of  t.  Since  the  mean  is  independent 
of  time  and  the  autocorrelation  depends  only  on  time  differences,  the  autocovariance  also  depends 
only  on  time  differences,  and  is  given  by 

Cx(r)  =  Rx(t)  -  \mx\2  for  X  WSS  (5.72) 

Second  order  statistics  for  running  example  (new  notation):  With  this  new  notation, 
we  have 

mx  =  0  ,  Rx{t)  =  Cx(t)  =  cos2v r/cr  (5.73) 

A  WSS  random  process  has  shift-invariant  second  order  statistics.  An  even  stronger  notion  of 
shift- invariance  is  stationarity. 

Stationary  random  process:  A  random  process  X (t)  is  said  to  be  stationary  if  it  is  statistically 
indistinguishable  from  a  delayed  version  of  itself.  That  is,  X(t)  and  X(t  —  d)  have  the  same 
statistics  for  any  delay  d  G  (— oo,  oo). 

Running  example:  The  sinusoid  with  random  amplitude  and  phase  in  our  running  example  is 
stationary.  To  see  this,  it  is  convenient  to  consider  the  polar  form  in  (5.63):  X(t)  =  A  cos(27r/ct  + 
0),  where  0  is  uniformly  distributed  over  [0,27t].  Note  that 

Y{t)  =  X{t  —  d)  =  A  cos(27t fc(t  —  d)  +  0)  =  A  cos(27t fct  +  0') 

where  0'  =  0  —  2Xfcd  modulo  27t  is  uniformly  distributed  over  [0,27t].  Thus,  A"  and  Y  are 
statistically  indistinguishable. 

Stationarity  implies  wide  sense  stationarity:  For  a  stationary  random  process  X,  the  mean 
function  satisfies 

mx(t)  =  mx(t  -  d) 

for  any  t,  regardless  of  the  value  of  d.  Choosing  d  =  t,  we  infer  that 

mx(t)=mx(  0)  (5.74) 

That  is,  the  mean  function  is  a  constant.  Similarly,  the  autocorrelation  function  satisfies 

RX(ti,t2)  =  R,x  (t  i  —  d,  t2  —  d) 

for  any  ti,t2,  regardless  of  the  value  of  d.  Setting  d  =  t2,  we  have  that 

Rx(ti,  t2)  —  Rx(ti  t2, 0)  (5.75) 

Thus,  a  stationary  process  is  also  WSS. 

While  our  running  example  was  easy  to  analyze,  in  general,  stationarity  is  a  stringent  requirement 
that  is  not  easy  to  verify.  For  our  needs,  the  weaker  concept  of  wide  sense  stationarity  typically 
suffices.  Further,  we  are  often  interested  in  Gaussian  random  processes  (defined  shortly),  for 
which  wide  sense  stationarity  actually  implies  stationarity. 
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5.7.5 


Power  Spectral  Density 
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Figure  5.18:  Operational  definition  of  PSD  for  a  sample  path  x(t). 


For  deterministic  finite-energy  signals,  we  introduced  the  concept  of  energy  spectral  density, 
which  specifies  how  the  energy  in  a  signal  is  distributed  in  different  frequency  bands,  in  Chapter 
2.  Similarly,  we  defined  power  spectral  density  (PSD)  for  finite-power  deterministic  signals  in 
Chapter  4,  “just  in  time”  to  characterize  the  spectral  occupancy  of  digital  communication  signals. 
This  deterministic  framework  directly  applies  to  a  given  sample  path  of  a  random  process,  and 
indeed,  this  is  what  we  did  when  we  computed  the  PSD  of  linearly  modulated  signals  in  Chapter 
4.  While  we  did  not  mention  the  term  “random  process”  then  (for  the  good  reason  that  we  had 
not  introduced  it  yet),  if  we  model  the  information  encoded  into  a  digitally  modulated  signal  as 
random,  then  the  latter  is  indeed  a  random  process.  Let  us  now  begin  by  restating  the  definition 
of  PSD  in  Chapter  4. 


Power  Spectral  Density:  The  power  spectral  density  (PSD),  Sx(f),  for  a  finite-power  signal 
x{t ),  which  we  can  now  think  of  as  a  sample  path  of  a  random  process,  is  defined  through  the 
conceptual  measurement  depicted  in  Figure  5.18.  Pass  x{t)  through  an  ideal  narrowband  filter 
with  transfer  function 


HAf) 


i,  v  -  Hr  <  f  <"  +  % 

0,  else 


The  PSD  evaluated  at  u,  Sx(v),  is  defined  as  the  measured  power  at  the  filter  output,  divided 
by  the  filter  width  A /  (in  the  limit  as  A /  — >■  0). 


The  power  meter  in  Figure  5.18  is  averaging  over  time  to  estimate  the  power  in  a  frequency  slice 
of  a  particular  sample  path.  Let  us  review  how  this  is  done  before  discussing  how  to  average 
across  sample  paths  to  define  PSD  in  terms  of  an  ensemble  average. 


Periodogram-based  PSD  estimation:  The  PSD  can  be  estimated  by  computing  Fourier 
transform  over  a  finite  observation  interval,  and  dividing  its  magnitude  squared  (which  is  the 
energy  spectral  density)  by  the  length  of  the  observation  interval.  The  time-windowed  version  of 
x  is  defined  as 

xTo(t )  =  x(t)L_To  To  At)  (5.76) 

where  T0  is  the  length  of  the  observation  interval.  The  Fourier  transform  of  XT0(t )  is  denoted  as 


Xt.U)  =  H*r.) 

The  energy  spectral  density  of  xt0  is  therefore  |Xr0(/)|2,  and  the  PSD  estimate  is  given  by 

Sx(f)  =  1  T°Ujl  (5.77) 

O 


PSD  for  a  sample  path:  Formally,  we  define  the  PSD  for  a  sample  path  in  the  limit  of  large 
time  windows  as  follows: 

Sx(f)  =  lim  PSD  for  sample  path  (5.78) 

T0— >oo  T0 
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The  preceding  definition  involves  time  averaging  across  a  sample  path,  and  can  be  related  to  the 
time-averaged  autocorrelation  function,  defined  as  follows. 

Time-averaged  autocorrelation  function  for  a  sample  path:  For  a  sample  path  x(t),  we 
define  the  time-averaged  autocorrelation  function  as 


Rx(r) 


x(t)x*(t  —  r ) 


1 

lim  —  /  x(t)x*(t  —  r)  dt 
T0->oo  Ta  W  V  ' 


We  now  state  the  following  important  result. 

Time-averaged  PSD  and  autocorrelation  function  form  a  Fourier  transform  pair. 

Sx(f)^Rx(r)  (5.79) 

We  omit  the  proof,  but  the  result  can  be  derived  using  the  techniques  of  Chapter  2. 

Time-averaged  PSD  and  autocorrelation  function  for  running  example:  For  our  ran¬ 
dom  sinusoid  (5.63),  the  time  averaged  autocorrelation  function  is  given  by 

Rx(t)  =  A  cos(27t  fct  +  Q)Acos(2nfc(t  —  r)  +  0) 

=  ^cos27t/ct  +  cos(47r/ct  —  27 r/cr  +  20)  (5.80) 

=  4r  cos  2nfcT 

The  time  averaged  PSD  is  given  by 

Sx(/)  =  ^(/  -  fc)  +  +  /„)  (5.81) 


We  now  extend  the  concept  of  PSD  to  a  statistical  average  as  follows. 


Ensemble-averaged  PSD:  The  ensemble-averaged  PSD  for  a  random  process  is  defined  as 
follows: 


Sx{f) 


lim  E 

T0— >•  oo 


'\XTo(f)\2' 

T0 


ensemble  averaged  PSD 


(5.82) 


That  is,  we  take  the  expectations  of  the  PSD  estimates  computed  over  an  observation  interval, 
and  then  let  the  observation  interval  get  large. 

Potential  notational  confusion:  We  use  capital  letters  (e.g.,  X(t))  to  denote  a  random  process 
and  small  letters  (e.g.,  x(t))  to  denote  sample  paths.  However,  we  also  use  capital  letters  to 
denote  the  Fourier  transform  of  a  time  domain  signal  (e.g.,  s(t)  -h-  S(f)),  as  introduced  in 
Chapter  2.  Rather  than  introducing  additional  notation  to  resolve  this  potential  ambiguity,  we 
rely  on  context  to  clarify  the  situation.  In  particular  (5.82)  illustrates  this  potential  problem.  On 
the  left-hand  side,  we  use  X  to  denote  the  random  process  whose  PSD  Sx{f )  we  are  interested  in. 
On  the  right-hand  side,  we  use  X<r0(f)  to  denote  the  Fourier  transform  of  a  windowed  sample  path 
XTa(t).  Such  opportunities  for  confusion  arise  seldom  enough  that  it  is  not  worth  complicating 
our  notation  to  avoid  them. 


A  result  analogous  to  (5.79)  holds  for  ensemble-averaged  quantities  as  well. 

Ensemble-averaged  PSD  and  autocorrelation  function  for  WSS  processes  form  a 
Fourier  transform  pair  (Wiener-Khintchine  theorem).  For  a  WSS  process  X  with  auto¬ 
correlation  function  Rx(t),  the  ensemble  averaged  PSD  is  the  Fourier  transform  of  the  ensemble- 
averaged  autocorrelation  function: 


SX(f)  —  X  (Rx(r)) 


Rx(r)e-j2nfT  dr 


(5.83) 
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This  result  is  called  the  Wiener-Khintchine  theorem,  and  can  be  proved  under  mild  conditions 
on  the  autocorrelation  function  (the  area  under  \Rx(t)\  must  be  finite  and  its  Fourier  transform 
must  exist).  The  proof  requires  advanced  probability  concepts  beyond  our  scope  here,  and  is 
omitted. 

Ensemble-averaged  PSD  for  running  example:  For  our  running  example,  the  PSD  is 
obtained  by  taking  the  Fourier  transform  of  (5.73): 

Sx(f)  =  (/  -  fc)  +  \f>U  +  fc)  (5.84) 

That  is,  the  power  in  X  is  concentrated  at  ±/c,  as  we  would  expect  for  a  sinusoidal  signal  at 
frequency  fc. 

Power:  It  follows  from  the  Wiener-Khintchine  theorem  that  the  power  of  X  can  be  obtained 
either  by  integrating  the  PSD  or  evaluating  the  autocorrelation  function  at  r  =  0: 

/OO 

Sx(f)df  (5.85) 

-OO 

For  our  running  example,  we  obtain  from  (5.73)  or  (5.84)  that  Px  =  1. 

Ensemble  versus  Time  Averages:  For  our  running  example,  we  computed  the  ensemble- 
averaged  autocorrelation  function  Rx{t)  and  then  used  the  Wiener-Khintchine  theorem  to  com¬ 
pute  the  PSD  by  taking  the  Fourier  transform.  At  other  times,  it  is  convenient  to  apply  the 
operational  definition  depicted  in  Figure  5.18,  which  involves  averaging  across  time  for  a  given 
sample  path.  If  the  two  approaches  give  the  same  answer,  then  the  random  process  is  said  to  be 
ergodic  in  PSD.  In  practical  terms,  ergodicity  means  that  designs  based  on  statistical  averages 
across  sample  paths  can  be  expected  to  apply  to  individual  sample  paths,  and  that  measure¬ 
ments  carried  out  on  a  particular  sample  path  can  serve  as  a  proxy  for  statistical  averaging 
across  multiple  realizations. 

Comparing  (5.81)  and  (5.84),  we  see  that  our  running  example  is  actually  not  ergodic  in  PSD. 
For  any  sample  path  x(t )  =  Acos(2irfct  +  9),  it  is  quite  easy  to  show  that 

SM)  =  y<5(/  -  Sc)  +  +  A)  (5-86) 

Comparing  with  (5.84),  we  see  that  the  time-averaged  PSD  varies  across  sample  paths  due  to 
amplitude  variations,  with  A2  replaced  by  its  expectation  in  the  ensemble-averaged  PSD. 

Intuitively  speaking,  ergodicity  requires  sufficient  richness  of  variation  across  time  and  sample 
paths.  While  this  is  not  present  in  our  simple  running  example  (a  randomly  chosen  amplitude 
which  is  fixed  across  the  entire  sample  path  is  the  culprit),  it  is  often  present  in  the  more 
complicated  random  processes  of  interest  to  us,  including  receiver  noise  and  digitally  modulated 
signals  (under  appropriate  conditions  on  the  transmitted  symbol  sequences).  When  ergodicity 
holds,  we  have  our  choice  of  using  either  time  averaging  or  ensemble  averaging  for  computations, 
depending  on  which  is  most  convenient  or  insightful. 

The  autocorrelation  function  and  PSD  must  satisfy  the  following  structural  properties  (these 
apply  to  ensemble  averages  for  WSS  processes,  as  well  as  to  time  averages,  although  our  notation 
corresponds  to  ensemble  averages). 

Structural  properties  of  PSD  and  autocorrelation  function 

(PI)  Sx(f)  >  0  for  all  /. 

This  follows  from  the  sample  path  based  definition  in  Figure  5.18,  since  the  output  of  the  power 
meter  is  always  nonnegative.  Averaging  across  sample  paths  preserves  this  property. 
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(P2a)  The  autocorrelation  function  is  conjugate  symmetric:  Rx(j)  =  R*x(—t). 

This  follows  quite  easily  from  the  definition  (5.71).  By  setting  t  =  u  +  r,  we  have 

Rx{r)  =  E  [X(u  +  t)X*(u)}  =  (E[X(w)X*(w  +  r)})*  =  R*x{-r) 

(P2b)  For  real-valued  X,  both  the  autocorrelation  function  and  PSD  are  symmetric  and  real¬ 
valued.  Sx(f)  =  Sx(-f)  and  Rx[r)  =  Rx(~t). 

(This  is  left  as  an  exercise.) 

Any  function  g{r)  -B-  G(f)  must  satisfy  these  properties  in  order  to  be  a  valid  autocorrelation 
function/PSD. 

Example  5.7.1  (Which  function  is  an  autocorrelation?)  For  each  of  the  following  func¬ 
tions,  determine  whether  it  is  a  valid  autocorrelation  function. 

(a)  gi(r)  =  sin(r),  (b)  g2{r)  =  /[_i,i](t),  (c)  g3(r)  =  e~|r| 

Solution 

(a)  This  is  not  a  valid  autocorrelation  function,  since  it  is  not  symmetric  and  violates  property 
(P2b). 

(b)  This  satisfies  Property  (P2b).  However,  7[_lil](r)  -B  2sinc(2/),  so  that  Property  (PI)  is 
violated,  since  the  sine  function  can  take  negative  values.  Hence,  the  boxcar  function  cannot  be 
a  valid  autocorrelation  function.  This  example  shows  that  non-negativity  Property  PI  places  a 
stronger  constraint  on  the  validity  of  a  proposed  function  as  an  autocorrelation  function  than 
the  symmetry  Property  P2. 

(c)  The  function  g3(r)  is  symmetric  and  satisfies  Property  (P2b).  It  is  left  as  an  exercise  to  check 
that  G3(f)  >  0,  hence  Property  (PI)  is  also  satisfied. 

Units  for  PSD:  Power  per  unit  frequency  has  the  same  units  as  power  multiplied  by  time,  or 
energy.  Thus,  the  PSD  is  expressed  in  units  of  Watts/Hertz,  or  Joules. 


x(t)  — 
real-valued 


real-valued  impulse  response 


Figure  5.19:  Operational  definition  of  one-sided  PSD. 


One-sided  PSD:  The  PSD  that  we  have  talked  about  so  far  is  the  two-sided  PSD,  which  spans 
both  positive  and  negative  frequencies.  For  a  real- valued  X ,  we  can  restrict  attention  to  positive 
frequencies  alone  in  defining  the  PSD,  by  virtue  of  property  (P2b).  This  yields  the  one-sided 
PSD  £+(/),  defined  as 

Si(f)  =  Sx(f)+Sx(-f)  =  2Sx(f),  />  0,  (X(t)real)  (5.87) 

It  is  useful  to  interpret  this  in  terms  of  the  sample  path  based  operational  definition  shown  in 
Figure  5.19.  The  signal  is  passed  through  a  physically  realizable  filter  (i.e. ,  with  real- valued 
impulse  response)  of  bandwidth  A/,  centered  around  v.  The  filter  transfer  function  must  be 
conjugate  symmetric,  hence 

1 1,  K-y</<a+y 

=  <  1,  -v-%  <f<-v  +  ^f 
y  0,  else 
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The  one-sided  PSD  is  defined  as  the  limit  of  the  power  of  the  filter  output,  divided  by  A/,  as 
A /  — >■  0.  Comparing  Figures  5.18  and  5.19,  we  have  that  the  sample  path  based  one-sided  PSD 
is  simply  twice  the  two-sided  PSD:  3+(f)  =  ( Sx(f )  +  Sx(-f))  /{/>0}  =  2 Sx(f)I{f>0}. 

One-sided  PSD  for  running  example:  From  (5.84),  we  obtain  that 

Si(f)=Hf~fc)  (5.88) 

with  all  the  power  concentrated  at  fc,  as  expected. 

Power  in  terms  of  PSD:  We  can  express  the  power  of  a  real- valued  random  process  in  terms 
of  either  the  one-sided  or  two-sided  PSD: 

/oo  roc 

Sx(f)df=  (for  X  real)  /  S+(f)df  (5.89) 

-oo  J  0 

Baseband  and  passband  random  processes:  A  random  process  X  is  baseband  if  its  PSD 
is  baseband,  and  is  passband  if  its  PSD  is  passband.  Thinking  in  terms  of  time  averaged  PSDs, 
which  are  based  on  the  Fourier  transform  of  time  windowed  sample  paths,  we  see  that  a  random 
process  is  baseband  if  its  sample  paths,  time  windowed  over  a  large  enough  observation  interval, 
are  (approximately)  baseband.  Similarly,  a  random  process  is  passband  if  its  sample  paths, 
time  windowed  over  a  large  enough  observation  interval,  are  (approximately)  passband.  The 
caveat  of  “large  enough  observation  interval”  is  inserted  because  of  the  following  consideration: 
timelimited  signals  cannot  be  strictly  bandlimited,  but  as  long  as  the  observation  interval  is  large 
enough,  the  time  windowing  (which  corresponds  to  convolving  the  spectrum  with  a  sine  function) 
does  not  spread  out  the  spectrum  of  the  signal  significantly.  Thus,  the  PSD  (which  is  obtained 
taking  the  limit  of  large  observation  intervals)  also  defines  the  frequency  occupancy  of  the  sample 
paths  over  large  enough  observation  intervals.  Note  that  these  intuitions,  while  based  on  time 
averaged  PSDs,  also  apply  when  bandwidth  occupancy  is  defined  in  terms  of  ensemble-averaged 
PSDs,  as  long  as  the  random  process  is  ergodic  in  PSD. 


Message  PSD 


PSD  of  DSB-SC  signal 


Figure  5.20:  The  relation  between  the  PSDs  of  a  message  and  the  corresponding  DSB-SC  signal. 


Example  (PSD  of  a  modulated  passband  signal):  Consider  a  passband  signal  up(t)  = 
m(t )  cos27t f0t,  where  m{t)  is  a  message  modeled  as  a  baseband  random  process  with  PSD  Sm(f) 
and  power  Pm.  Timelimiting  to  an  interval  of  length  Ta  and  going  to  the  frequency  domain,  we 
have 

Up,tM )  =  \  (MTo(f  ~  fo)  +  MTo(f  -  /„))  (5.90) 

Taking  the  magnitude  squared,  dividing  by  T0,  and  letting  Ta  get  large,  we  obtain 

SUp(f)  =  \  (Sm(f  -  /„)  +  Sm(f  +  f0))  (5.91) 


An  example  is  shown  in  Figure  5.20. 
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Thus,  we  start  with  the  formula  (5.90)  relating  the  Fourier  transform  for  a  given  sample  path, 
which  is  identical  to  what  we  had  in  Chapter  2  (except  that  we  now  need  to  time  limit  the  finite 
power  message  to  obtain  a  finite  energy  signal),  and  obtain  the  relation  (5.91)  relating  the  PSDs. 
An  example  is  shown  in  Figure  5.20.  We  can  now  integrate  the  PSDs  to  get 

1  P 

p  —  -  (p  +  p  )  —  -Pl 

1  u  ^  y1  m  ~  1  m)  2 


5.7.6  Gaussian  random  processes 

Gaussian  random  processes  are  just  generalizations  of  Gaussian  random  vectors  to  an  arbitrary 
number  of  components  (countable  or  uncountable). 

Gaussian  random  process:  A  random  process  X  =  {X(i),£eT}  is  said  to  be  Gaussian  if 
any  linear  combination  of  samples  is  a  Gaussian  random  variable.  That  is,  for  any  number  n  of 
samples,  any  sampling  times  t±,  ...,t„,  and  any  scalar  constants  cq,  ...,an,  the  linear  combination 
a\X{ti)  + ...  +  anX(tn )  is  a  Gaussian  random  variable.  Equivalently,  the  samples  X(ti),  ...,  X(£n) 
are  jointly  Gaussian. 

Our  running  example  (5.62)  is  a  Gaussian  random  process,  since  any  linear  combination  of 
samples  is  a  linear  combination  of  the  jointly  Gaussian  random  variables  X\  and  X2,  and  is 
therefore  a  Gaussian  random  variable. 

A  linear  combination  of  samples  from  a  Gaussian  random  process  is  completely  characterized  by 
its  mean  and  variance.  To  compute  the  latter  quantities  for  an  arbitrary  linear  combination,  we 
can  show,  as  we  did  for  random  vectors,  that  all  we  need  to  know  are  the  mean  function  (analogous 
to  the  mean  vector)  and  the  autocovariance  function  (analogous  to  the  covariance  matrix)  of  the 
random  process.  These  functions  therefore  provide  a  complete  statistical  characterization  of  a 
Gaussian  random  process,  since  the  definition  of  a  Gaussian  random  process  requires  only  that 
we  be  able  to  characterize  the  distribution  of  an  arbitrary  linear  combination  of  samples. 

Characterizing  a  Gaussian  random  process:  The  statistics  of  a  Gaussian  random  process 
are  completely  specified  by  its  mean  function  rrix(t)  =  E[X(f)  and  its  autocovariance  function 
Cx(ti,t2)  =  E[X(ii)X(£2)].  Given  the  mean  function,  the  autocorrelation  function  Rx(t i,t2)  = 
E[X(ti)X(t2)]  can  be  computed  from  Cx(ti,t2),  and  vice  versa,  using  the  following  relation: 

Rx(t1,t2)  =  Cx(tuti)  +  mx(ti)mx(t2)  (5.92) 

It  therefore  also  follows  that  a  Gaussian  random  process  is  completely  specihed  by  its  mean  and 
autocorrelation  functions. 

WSS  Gaussian  random  processes  are  stationary:  We  know  that  a  stationary  random 
process  is  WSS.  The  converse  is  not  true  in  general,  but  Gaussian  WSS  processes  are  indeed 
stationary.  This  is  because  the  statistics  of  a  Gaussian  random  process  are  characterized  by  its 
first  and  second  order  statistics,  and  if  these  are  shift-invariant  (as  they  are  for  WSS  processes), 
the  random  process  is  statistically  indistinguishable  under  a  time  shift. 


Example  5.7.2  Suppose  that  Y  is  a  Gaussian  random  process  with  mean  function  my(t)  =  3 1 
and  autocorrelation  function  RY(t1,t2)  =  4e— 1<1— t2l  +  9tit2. 

(a)  Find  the  probability  that  Y(2)  is  bigger  than  10. 

(b)  Specify  the  joint  distribution  of  Y( 2)  and  X(3). 

(c)  True  or  False  Y  is  stationary. 

(d)  True  or  False  The  random  process  Z(t)  =  Y (t)  —  3 1  is  stationary. 

Solution:  (a)  Since  Y  is  a  Gaussian  random  process,  the  sample  Y (2)  is  a  Gaussian  random 
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variable  with  mean  my (2)  =  6  and  variance  Cy(2,  2)  =  Ry( 2,  2)  —  (my(2))2  =  4.  More  generally, 
note  that  the  autocovariance  function  of  Y  is  given  by 

CY(t1,t2)  =  RY(t1,t2 )  -  mY(ti)mY(t2)  =  4e_|il_i21  +  9M2  -  (3ti)(3t2)  =  4e_|tl_t21 

so  that  var(y(t))  =  Cy(t,t )  =  4  for  any  sampling  time  t. 

We  have  shown  that  Y( 2)  ~  A^(6,4),  so  that 

P\Y( 2)  >  10]  =  Q  =  <5(2) 


(b)  Since  Y  is  a  Gaussian  random  process,  Y (2)  and  Y (3)  are  jointly  Gaussian,  with  distribution 
specified  by  the  mean  vector  and  covariance  matrix  given  by 


m  = 


my(  2) 
mY(  3) 


6 

9 


(  CY( 2,2)  Gy  (2,  3)  \  _  f  4  4e-1  \ 

V  Cy( 3,2)  Gy  (3, 3)  J  V  4e_1  4  ) 

(c)  Y  has  time-varying  mean,  and  hence  is  not  WSS.  This  implies  it  is  not  stationary.  The 
statement  is  therefore  False. 

(d)  Z(t)  —Y(t)  —  3t  —  Y(t)  —  mY(t )  is  zero  mean  version  of  Y.  It  inherits  the  Gaussianity  of 
Y.  The  mean  function  mz(t)  =  0  and  the  autocorrelation  function,  given  by 


Rz(ti,t2)  =E[(y(ti)  -mY(ti))  ( Y(t2 )  -mY(t2))]  =  CY(tut2 )  =  4e  ltl  t2] 


depends  on  the  time  difference  t\  —  t2  alone.  Thus,  Z  is  WSS.  Since  it  also  Gaussian,  this  implies 
that  Z  is  stationary.  The  statement  is  therefore  True. 


5.8  Noise  Modeling 


Snpm 


s;p(f> 


Two-sided  PSD 


One-sided  PSD 


Figure  5.21:  The  PSD  of  passband  white  noise  is  flat  over  the  band  of  interest. 


We  now  have  the  background  required  to  discuss  mathematical  modeling  of  noise  in  communi¬ 
cation  systems.  A  generic  model  for  receiver  noise  is  that  it  is  a  random  process  with  zero  DC 
value,  and  with  PSD  which  is  flat,  or  white,  over  a  band  of  interest.  The  key  noise  mechanisms 
in  a  communication  receiver,  thermal  and  shot  noise,  are  both  white,  as  discussed  in  Appendix 
5.C.  For  example,  Figure  5.21  shows  the  two-sided  PSD  of  passband  white  noise  np(t ),  which  is 
given  by 

f  Ao/2  ,  \f-fc\<B/2 
SnP(f)=  \  ^0/2,  \f  +  fc\<B/2 
I  0  ,  else 
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Since  np(t)  is  real- valued,  we  can  also  define  the  one-sided  PSD  as  follows: 


s+(f)  = 


No,  \f-fc\<B/2 
0  ,  else 


That  is,  white  noise  has  two-sided  PSD  and  one-sided  PSD  N0,  over  the  band  of  interest. 
The  power  of  the  white  noise  is  given  by 

/OO 

Snp(f)df  =  (N0/2)2B  =  N0B 

-oo 

The  PSD  N0  is  in  units  of  Watts/Hertz,  or  Joules. 
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Two-sided  PSD  One-sided  PSD 

Figure  5.22:  The  PSD  of  baseband  white  noise. 


Similarly,  Figure  5.22  shows  the  one-sided  and  two-sided  PSDs  for  real-valued,  white  noise  in  a 
physical  baseband  system  with  bandwidth  B.  The  power  of  this  baseband  white  noise  is  again 
N0B.  As  we  discuss  in  Section  5.D,  as  with  deterministic  passband  signals,  passband  random 
processes  can  also  be  represented  in  terms  of  1  and  Q  components.  We  note  in  Section  5.D  that 
the  I  and  Q  components  of  passband  white  noise  are  baseband  white  noise  processes,  and  that 
the  corresponding  complex  envelope  is  complex-valued  white  noise. 

Noise  Figure:  The  value  of  Nq  summarizes  the  net  effects  of  white  noise  arising  from  various 
devices  in  the  receiver.  Comparing  the  noise  power  NqB  with  the  nominal  figure  of  kTB  for 
thermal  noise  of  a  resistor  with  matched  impedance,  we  define  the  noise  figure  as 


F 


No 


where  k  =  1.38  x  1CT23  Joules/Kelvin  is  Boltzmann’s  constant,  and  the  nominal  “room  temper¬ 
ature  ”  is  taken  by  convention  to  be  Troom  =  290  Kelvin  (the  product  kTroom  «4x  10~21  Joules, 
so  that  the  numbers  work  out  well  for  this  slightly  chilly  choice  of  room  temperature  at  62.6° 
Fahrenheit).  Noise  figure  is  usually  expressed  in  dB. 

The  noise  power  for  a  bandwidth  B  is  given  by 


Pn  =  NoB  =  kTroom10F^wB 


dBW  and  dBm:  It  is  customary  to  express  power  on  the  decibel  (dB)  scale: 

Power  (dBW)  =  10  log10(Power  (watts)) 

Power  (dBm)  =  10 log10(Power  (milliwatts)) 
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On  the  dB  scale,  the  noise  power  over  1  Hz  is  therefore  given  by 

Noise  power  over  1  Hz  =  —174  +  F  dBm  (5.93) 

Thus,  the  noise  power  in  dBm  over  a  bandwidth  of  B  Hz  is  given  by 

Pn(dBm )  =  —174  +  F  +  10  log10  B  dBm  (5.94) 

Example  5.8.1  (Noise  power  computation)  A  5  GHz  Wireless  Local  Area  Network  (WLAN) 
link  has  a  receiver  bandwidth  B  of  20  MHz.  If  the  receiver  has  a  noise  figure  of  6  dB,  what  is 
the  receiver  noise  power  Pn? 

Solution:  The  noise  power 

Pn  =  N0B  =  kTo10F/wB  =  (1.38  x  l(r23)(290)(106/10)(20  x  106) 

=  3.2  x  10”13  Watts  =  3.2  x  10-10  milliWatts  (mW) 

The  noise  power  is  often  expressed  in  dBm,  which  is  obtained  by  converting  the  raw  number  in 
milliWatts  (mW)  into  dB.  We  therefore  get 

Pn,dBm  =  10  log10  Pn(mW)  =  — 95dBm 

Let  us  now  redo  this  computation  in  the  “dB  domain,”  where  the  contributions  to  the  noise 
power  due  to  the  various  system  parameters  simply  add  up.  Using  (5.93),  the  noise  power  in  our 
system  can  be  calculated  as  follows: 

Pn(dBm)  =  —174  +  Noise  Figure(dB)  +  10  log10  Bandwidth(Hz)  (5.95) 

In  our  current  example,  we  obtain  Pn(dBm)  =  —174  +  6  +  73  =  —95  dBm,  as  before. 
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Figure  5.23:  Since  receiver  processing  always  involves  some  form  of  band  limitation,  it  is  not 
necessary  to  impose  band  limitation  on  the  WGN  model. 

We  now  add  two  more  features  to  our  noise  model  that  greatly  simplify  computations.  First,  we 
assume  that  the  noise  is  a  Gaussian  random  process.  The  physical  basis  for  this  is  that  noise 


249 


arises  due  to  the  random  motion  of  a  large  number  of  charge  carriers,  which  leads  to  Gaussian 
statistics  based  on  the  central  limit  theorem  (see  Section  5.B).  The  mathematical  consequence 
of  Gaussianity  is  that  we  can  compute  probabilities  based  only  on  knowledge  of  second  order 
statistics.  Second,  we  remove  band  limitation,  implicitly  assuming  that  it  will  be  imposed  later 
by  filtering  at  the  receiver.  That  is,  we  model  noise  n(t)  (where  n  can  be  real-valued  passband 
or  baseband  white  noise)  as  a  zero  mean  WSS  random  process  with  PSD  flat  over  the  entire 
real  line,  Sn(f)  =  The  corresponding  autocorrelation  function  is  i?„(r)  =  ^5(t).  This 
model  is  clearly  physically  unrealizable,  since  the  noise  power  is  infinite.  However,  since  receiver 
processing  in  bandlimited  systems  always  involves  filtering,  we  can  assume  that  the  receiver  noise 
prior  to  filtering  is  not  bandlimited  and  still  get  the  right  answer.  Figure  5.23  shows  the  steps 
we  use  to  go  from  receiver  noise  in  bandlimited  systems  to  infinite-power  White  Gaussian  Noise 
(WGN),  which  we  formally  define  below. 

White  Gaussian  Noise:  Real-valued  WGN  n(t)  is  a  zero  mean,  WSS,  Gaussian  random  process 
with  Sn(f)  =  N0/2  =  cr2.  Equivalently,  Rn(r)  =  ^-S(r)  =  cr2<5(r).  The  quantity  N0/ 2  =  cr2 
is  often  termed  the  two-sided  PSD  of  WGN,  since  we  must  integrate  over  both  positive  and 
negative  frequencies  in  order  to  compute  power  using  this  PSD.  The  quantity  Nq  is  therefore 
referred  to  as  the  one-sided  PSD,  and  has  the  dimension  of  Watts/Hertz,  or  Joules. 

The  following  example  provides  a  preview  of  typical  computations  for  signaling  in  WGN,  and 
illustrates  why  the  model  is  so  convenient. 

Example  5.8.2  (On-off  keying  in  continuous  time):  A  receiver  in  an  on-off  keyed  system 
receives  the  signal  y(t)  =  s(t)  +  n{t)  if  1  is  sent,  and  receives  y{t)  =  n{t)  if  0  is  sent,  where  n(t) 
is  WGN  with  PSD  a2  =  The  receiver  computes  the  following  decision  statistic: 

Y  =  [  y{t)s(t)dt 


(We  shall  soon  show  that  this  is  actually  the  best  thing  to  do.) 

(a)  Find  the  conditional  distribution  of  Y  if  0  is  sent. 

(b)  Find  the  conditional  distribution  of  Y  if  1  is  sent. 

(c)  Compare  with  the  on-off  keying  model  in  Example  5.6.3. 

Solution: 

(a)  Conditioned  on  0  being  sent,  y(t)  =  n(t)  and  hence  Y  =  f  n(t)s(t)dt.  Since  n  is  Gaussian, 
and  Y  is  obtained  from  it  by  linear  processing,  Y  is  a  Gaussian  random  variable  (conditioned  on 
0  being  sent).  Thus,  the  conditional  distribution  of  Y  is  completely  characterized  by  its  mean 
and  variance,  which  we  now  compute. 


E[y]  =  E 


Y 


J  n(t)s(t)dt 


J  s(t)K[n(t)]dt  =  0 


where  we  can  interchange  expectation  and  integration  because  both  arc  linear  operations.  Actu¬ 
ally,  there  are  some  mathematical  conditions  (beyond  our  scope  here)  that  need  to  be  satisfied  for 
such  “natural”  interchanges  to  be  permitted,  but  these  conditions  are  met  for  all  the  examples 
that  we  consider  in  this  text.  Since  the  mean  is  zero,  the  variance  is  given  by 


var(F)  =  E[E2]  =  E 


J  n(t)s(t)dt 


Jn(uUn)dn 


Notice  that  we  have  written  out  Y2  =  Y  x  Y  as  the  product  of  two  identical  integrals,  but 
with  the  “dummy”  variables  of  integration  chosen  to  be  different.  This  is  because  we  need  to 
consider  all  possible  cross  terms  that  could  result  from  multiplying  the  integral  with  itself.  We 
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now  interchange  expectation  and  integration  again,  noting  that  all  random  quantities  must  be 
grouped  inside  the  expectation.  This  gives  us 


var(Y)  =  /  /  E [n(t)n(u)]  s(t)s(u )  dt  du 


(5.96) 


Now  this  is  where  the  WGN  model  makes  our  life  simple.  The  autocorrelation  function 

E  [n(t)n(u)]  =  a  25{t  —  u ) 

Plugging  into  (5.96),  the  delta  function  collapses  the  two  integrals  into  one,  and  we  obtain 


var(Y)  =  a2  5(t  —  u)  s(t)s(u)  dt  du  =  a2  /  s2(t )  dt  =  cr2||s| 


We  have  therefore  shown  that  Y  ~  IV  (0,  a2||s||2)  conditioned  on  0  being  sent, 
(b)  Suppose  that  1  is  sent.  Then  y(t)  —  s(t)  +  n{t)  and 


Y=  (s(t)  +  n(t))  s(t)  dt—  s2(t )  dt+  n(t)s(t)dt  =  ||s||2  +  /  n(t)s(t)  dt 


We  already  know  that  the  second  term  on  the  extreme  right  hand  side  has  distribution  fV(0,  cr2\ |s|  j 2) 
The  distribution  remains  Gaussian  when  we  add  a  constant  to  it,  with  the  mean  being  translated 
by  this  constant.  We  therefore  conclude  that  Y  ~  7V(|  |s|  |2,  cr2|  |s|  |2),  conditioned  on  1  being  sent, 
(c)  The  decision  statistic  Y  obeys  exactly  the  same  model  as  in  Example  5.6.3,  with  m  =  ||s||2 
and  v2  =  cr2||s||2.  Applying  the  intuitive  decision  rule  in  that  example,  we  guess  that  1  is  sent  if 
Y  >  ||s||2/2,  and  that  0  is  sent  otherwise.  The  probability  of  error  for  that  decision  rule  equals 


Remark:  The  preceding  example  illustrates  that,  for  linear  processing  of  a  received  signal 
corrupted  by  WGN,  the  signal  term  contributes  to  the  mean,  and  the  noise  term  to  the  variance,  of 
the  resulting  decision  statistic.  The  resulting  Gaussian  distribution  is  a  conditional  distribution, 
because  it  is  conditioned  on  which  signal  is  actually  sent  (or,  for  on-off  keying,  whether  a  signal 
is  sent). 

Complex  baseband  WGN:  Based  on  the  definition  of  complex  envelope  that  we  have  used 
so  far  (in  Chapters  2  through  4),  the  complex  envelope  has  twice  the  energy/power  of  the 
corresponding  passband  signal  (which  may  be  a  sample  path  of  a  passband  random  process).  In 
order  to  get  a  unified  description  of  WGN,  however,  let  us  now  divide  the  complex  envelope  of 
both  signal  and  noise  by  This  cannot  change  the  performance  of  the  system,  but  leads  to 

the  complex  envelope  now  having  the  same  energy/power  as  the  corresponding  passband  signal. 
Effectively,  we  are  switching  from  defining  the  complex  envelope  via  up(t)  =  Re  (u(t)e-?27rA*^;  to 

defining  it  via  up(t)  =  Re  (v/2u(t)e:,27r^ct) .  This  convention  reduces  the  PSDs  of  the  I  and  Q 
component  by  a  factor  of  two:  we  now  model  them  as  independent  real  WGN  processes,  with 
Snc(f )  =  Sns(f )  =  N0/2  =  a2.  The  steps  in  establishing  this  model  are  shown  in  Figure  5.24. 

We  now  have  the  noise  modeling  background  needed  for  Chapter  6,  where  we  develop  a  framework 
for  optimal  reception,  based  on  design  criteria  such  as  the  error  probability.  The  next  section 
discusses  linear  processing  of  random  processes,  which  is  useful  background  for  our  modeling 
the  effect  of  filtering  on  noise,  as  well  as  for  computing  quantities  such  as  signal-to-noise  ratio 
(SNR).  It  can  be  skipped  by  readers  anxious  to  get  to  Chapter  6,  since  the  latter  includes  a 
self-contained  exposition  of  the  effects  of  the  relevant  receiver  operations  on  WGN. 
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Figure  5.24:  We  scale  the  complex  envelope  for  both  signal  and  noise  by  ^=,  so  that  the  I  and  Q 
components  of  passband  WGN  can  be  modeled  as  independent  WGN  processes  with  PSD  Nq/2. 


5.9  Linear  Operations  on  Random  Processes 

We  now  wish  to  understand  what  happens  when  we  perform  linear  operations  such  as  filtering 
and  correlation  on  a  random  process.  We  have  already  seen  an  example  of  this  in  Example 
5.8.2,  where  WGN  was  correlated  against  a  deterministic  signal.  We  now  develop  a  more  general 
framework. 

It  is  useful  to  state  up  front  the  following  result. 

Gaussianity  is  preserved  under  linear  operations:  Thus,  if  the  input  to  a  filter  is  a  Gaussian 
random  process,  so  is  the  output. 

This  is  because  any  set  of  output  samples  can  be  expressed  as  a  linear  combination  of  input 
samples,  or  the  limit  of  such  linear  combinations  (an  integral  for  computing,  for  example,  a 
convolution,  is  the  limit  of  a  sum). 

In  the  remainder  of  this  section,  we  discussion  the  evolution  of  second  order  statistics  under 
linear  operations.  Of  course,  for  Gaussian  random  processes,  this  suffices  to  provide  a  complete 
statistical  description  of  the  output  of  a  linear  operation. 


5.9.1  Filtering 

Suppose  that  a  random  process  x{t)  is  passed  through  a  filter,  or  an  LTI  system,  with  transfer 
function  G(f)  and  impulse  response  g(t),  as  shown  in  Figure  5.25. 

The  PSD  of  the  output  y{t)  is  related  to  that  of  the  input  as  follows: 

Sy(f)  =  Sx(f)\G(f)\2  (5.97) 
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Figure  5.25:  Random  process  through  an  LTI  system. 


This  follows  immediately  from  the  operational  definition  of  PSD  in  Figure  5.18,  since  the  power 
gain  due  to  the  filter  at  frequency  /  is  \G(f)\2.  Now, 

\G(f)\2  =  G(f)G*(f)^(g*gMF)(t) 

where  gMF(t)  =  g*(—t).  Thus,  taking  the  inverse  Fourier  transform  on  both  sides  of  (5.97),  we 
obtain  the  following  relation  between  the  input  and  output  autocorrelation  functions: 

Ry{r)  =  (Rx*g*  gMF){r)  (5.98) 

Let  us  now  derive  analogous  results  for  ensemble  averages  for  filtered  WSS  processes. 


Filtered  WSS  random  processes 

Suppose  that  a  WSS  random  process  X  is  passed  through  an  LTI  system  with  impulse  response 
g(t)  (which  we  allow  to  be  complex- valued)  to  obtain  an  output  Y(t)  =  (X  *  g)(t).  We  wish  to 
characterize  the  joint  second  order  statistics  of  X  and  Y. 

Defining  the  crosscorrelation  function  of  Y  and  X  as 

Ryx  (t  +  r,t)  =  E[Y(t  +  r)X*(t)] 


we  have 


Ryx(t  +  r,t )  =  E 


X(t  +  T 


u)g(u)du  )  X*(t) 


J  Rx(t  -  u)g(u)du 


(5.99) 


interchanging  expectation  and  integration.  Thus,  Ryx(t  +  r,  t)  depends  only  on  the  time  differ¬ 
ence  r.  We  therefore  denote  it  by  Ryx(t )■  From  (5.99,  we  see  that 

Ryx  (t)  =  (Rx  *  g)(r) 

The  autocorrelation  function  of  Y  is  given  by 

Ry(t  +  r,t)  =  E  {Y(t  +  r )y*(«)]  =  E  [Y(t  +  r)  (f  X(t  -  u)g(u)du)  »] 

=  f  E[Y(t  +  r)X*(t  —  u)\g*{u)du  =  J  Ryx(t  +  u)g*[u)du 

Thus,  Ryit  +  r,t)  depends  only  on  the  time  difference  r,  and  we  denote  it  by  Ry(t).  Recalling 
that  the  matched  filter  gmf(u)  =  g*(—u)  and  replacing  u  by  —  u  in  the  integral  at  the  end  of 
(5.100),  we  obtain  that 


Ry{t)  =  (Ryx  *  gmf)(r)  =  (Rx  *g*  gmf)(r ) 
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Finally,  we  note  that  the  mean  function  of  Y  is  a  constant  given  by 


mY  =  mx,g  =  mxJg(u)du  =  mxG(  0) 

Thus,  X  and  Y  are  jointly  WSS:  X  is  WSS,  Y  is  WSS,  and  their  crosscorrelation  function  depends 
on  the  time  difference.  The  formulas  for  the  second  order  statistics,  including  the  corresponding 
power  spectral  densities  obtained  by  taking  Fourier  transforms,  are  collected  below: 

Ryx(t)  =  (Rx  *  g)(r),  SYX(f)  =  Sx(f)G(f) 

Ry{r)  =  (Ryx  *  9mf)  (t  )  =  ( Rx*g*9mf){T ),  Sy(f)  =  SYX(f)G*(f )  =  Sx(f)\G(f)\ 2 

(5.101) 

Let  us  apply  these  results  to  infinite  power  white  noise  (we  do  not  need  to  invoke  Gaussianity  to 
compute  second  order  statistics).  While  the  input  has  infinite  power,  as  shown  in  the  example 
below,  if  the  filter  impulse  response  is  square  integrable,  then  the  output  has  finite  power,  and 
is  equal  to  what  we  would  have  obtained  if  we  had  assumed  that  the  noise  was  bandlimited  to 
start  with. 

Example  5.9.1  (white  noise  through  an  LTI  system— general  formulas)  White  noise 
with  PSD  Sn(f)  =  ^  is  passed  through  an  LTI  system  with  impulse  response  g(t).  We  wish  to 
find  the  PSD,  autocorrelation  function,  and  power  of  the  output  y(t)  =  (n  *  g)(t).  The  PSD  is 
given  by 

Sy(f)  =  S„(/)|G(/)|2  =  4p|G(/)|2  (5.102) 

We  can  compute  the  autocorrelation  function  directly  or  take  the  inverse  Fourier  transform  of 
the  PSD  to  obtain 

N  N 

Ry(T)  =  (Rn*  9  *  9mf){r)  =  -j-(g  *  9mf)(T)  =  -y  J  g(s)g*(s  -  r)ds  (5.103) 

The  output  power  is  given  by 

/°°  N  /*°°  AT  AT 

Sy(f)df  =  /  \G{f)\2df  =  /  \g(t)\*dt  =  -^ll^ll2  (5.104) 

where  the  time  domain  expression  follows  from  Parseval’s  identity,  or  from  setting  r  =  0  in 
(5.103).  Thus,  the  output  noise  power  equals  the  noise  PSD  times  the  energy  of  the  filter  impulse 
response.  It  is  worth  noting  that  the  PSD  of  y  is  the  same  as  what  we  would  have  gotten  if  the 
input  were  bandlimited  white  noise,  as  long  as  the  band  is  large  enough  to  encompass  frequencies 
where  G(f)  is  nonzero.  Even  if  G(f)  is  not  strictly  bandlimited,  we  get  approximately  the  right 
answer  if  the  input  noise  bandwidth  is  large  enough  so  that  most  of  the  energy  in  G(f)  falls 
within  it. 

When  the  input  random  process  is  Gaussian  as  well  as  WSS,  the  output  is  also  WSS  and  Gaus¬ 
sian,  and  the  preceding  computations  of  second  order  statistics  provide  a  complete  statistical 
characterization  of  the  output  process.  This  is  illustrated  by  the  following  example,  in  which 
WGN  is  passed  through  a  filter. 

Example  5.9.2  (WGN  through  a  boxcar  impulse  response)  Suppose  that  WGN  n(t) 
with  PSD  cr2  =  ^  =  i  is  passed  through  an  LTI  system  with  impulse  response  g(t)  =  I[o,2\(t)  to 
obtain  the  output  y{t)  —  (n  *  g)(t). 

(a)  Find  the  autocorrelation  function  and  PSD  of  y. 
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(b)  Find  E[?/2(100)]. 

(c)  True  or  False  y  is  a  stationary  random  process. 

(d)  True  or  False:  2/(100)  and  y(  102)  are  independent  random  variables. 

(e)  True  or  False:  2/(100)  and  2/(101)  are  independent  random  variables. 

(f)  Compute  the  probability  P[?/(100)  —  2/(101)  +  2/(102)  >  5]  . 

(g)  Which  of  the  preceding  results  rely  on  the  Gaussianity  of  n? 

Solution 

(a)  Since  n  is  WSS,  so  is  y.  The  filter  matched  to  g  is  a  boxcar  as  well:  gmf(t)  =  /[_ 2,0] (^) -  Their 

convolution  is  a  triangular  pulse  centered  at  the  origin:  (g  *  2/m/)(r)  =  2  ^1  —  I [-2.2] (t).  We 

therefore  have 

Ry(j)  =  *  9mf^T">  =  \  (X  “  y)  Jh 2,2] (r)  =  Cy(r) 

(since  y  is  zero  mean).  The  PSD  is  given  by 

TVn 

Sy{f)  —  =  sinc2(2/) 

since  |G(/)|  =  |2sinc2/|.  Note  that  these  results  do  not  rely  on  Gaussianity. 

(b)  The  power  E[2/2(100)]  =  Ry( 0)  =  |. 

(c)  The  output  y  is  a  Gaussian  random  process,  since  it  obtained  by  a  linear  transformation  of 
the  Gaussian  random  process  n.  Since  y  is  WSS  and  Gaussian,  it  is  stationary.  True. 

(d)  The  random  variables  2/(100)  and  2/(102)  are  jointly  Gaussian  with  zero  mean  and  covariance 
cov(2/(100), 2/(102))  =  Cy( 2)  =  Ry( 2)  =  0.  Since  they  are  jointly  Gaussian  and  uncorrelated, 
they  are  independent.  True. 

(e)  I11  this  case,  cov(2/(100),  2/(101))  =  Cy(  1)  =  Ry(l)  =  3  7^  0,  so  that  2/(100)  and  2/(101)  are  not 
independent.  False. 

(f)  The  random  variable  Z  =  2/(100)  —  22/(101)  +  32/(102)  is  zero  mean  and  Gaussian,  with 

var (Z)  =  cov  (2/(100)  -  22/(101)  +  32/(102),  2/(100)  -  22/(101)  +  32/(102)) 

=  cov  (2/(100),  2/(100))  +  4cov  (2/(101),  2/(101))  +  9cov  (2/(102),  2/(102)) 

—  4cov  (2/(100),  2/(101))  +  6cov  (2/(100),  2/(102))  —  12cov  (2/(100),  2/(101)) 

=  Cy( 0)  +  4Cy(0)  +  9C„(0)  -  4Cy(l)  +  6Cy(2)  -  12Cy(l) 

=  14Cy(0)  -  16Cy(l)  +  6C'„(2)  =  3 

substituting  Cy( 0)  =  Cy(  1)  =  Cy( 2)  =  0.  Thus,  Z  1V(0,3),  and  the  required  probability 

can  be  evaluated  as 

P\Z  >5 }  =  Q  =  °-0019 

(g)  We  invoke  Gaussianity  in  (c),  (d),  and  (f). 


5.9.2  Correlation 

As  we  shall  see  in  Chapter  6,  a  typical  operation  in  a  digital  communication  receiver  is  to  correlate 
a  noisy  received  waveform  against  one  or  more  noiseless  templates.  Specifically,  the  correlation 
of  y(t)  (e.g.,  a  received  signal)  against  g{t)  (eg.,  a  noiseless  template  at  the  receiver)  is  defined 
as  the  inner  product  between  y  and  g ,  given  by 

/OO 

y(t)g*(t)dt 

■OO 

(We  restrict  attention  to  real-valued  signals  in  example  computations  provided  here,  but  the 
preceding  notation  is  general  enough  to  include  complex- valued  signals.) 


(5.105) 
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Signal-to-Noise  Ratio  and  its  Maximization 


If  y(t)  is  a  random  process,  we  can  compute  the  mean  and  variance  of  (y,g)  given  the  second 
order  statistics  (i.e. ,  mean  function  and  autocorrelation  function)  of  y,  as  shown  in  Problem  5.50. 
However,  let  us  consider  here  a  special  case  of  particular  interest  in  the  study  of  communication 
systems: 

y(t)  =  s(t )  +n(t) 

where  we  now  restrict  attention  to  real- valued  signals  for  simplicity,  with  s(t)  denoting  a  deter¬ 
ministic  signal  (e.g.,  corresponding  to  a  specific  choice  of  transmitted  symbols)  and  n(t)  zero 
mean  white  noise  with  PSD  Sn(f )  =  The  output  of  correlating  y  against  g  is  given  by 

/oo  /»oo 

s(t)g(t)dt+  /  n(t)g(t)dt 

-CO  J  —CO 


Since  both  the  signal  and  noise  terms  scale  up  by  identical  factors  if  we  scale  up  g,  a  performance 
metric  of  interest  is  the  ratio  of  the  signal  power  to  the  noise  power  at  the  output  of  the  correlator, 
defined  as  follows 


SNR 


\(s,g)\2 

n\(n,g)\2] 


How  should  we  choose  g  in  order  to  maximize  SNR?  In  order  to  answer  this,  we  need  to  compute 
the  noise  power  in  the  denominator.  We  can  rewrite  it  as 


E\(n,g)\2]=E 


J  n(t)g(t)dt 


n(s)g(s)ds 


where  we  need  to  use  two  different  dummy  variables  of  integration  to  make  sure  we  capture  all 
the  cross  terms  in  the  two  integrals.  Now,  we  take  the  expectation  inside  the  integrals,  grouping 
all  random  together  inside  the  expectation: 


E|(n,s>|2] 


E  [n(t)n(s)]g(t)g(s)dtds  =  Rn(t 


s)g(t)g(s)dtds 


This  is  where  the  infinite  power  white  noise  model  becomes  useful:  plugging  in  Rn(t  —  s)  = 
^■5(t  —  s),  we  find  that  the  two  integrals  collapse  into  one,  and  obtain  that 


E|<^>|2]  = 


Nn 


S(t  —  s)g(t)g(s)dtds  = 


Nn 


\g(t)\2dt  = 


Nn 


(5.106) 


Thus,  the  SNR  can  be  rewritten  as 


SNR 


\(s,g)\2 

fIMI 2 


—  \(s  — 

Nol{  ’IMI 


Drawing  on  the  analogy  between  signals  and  vectors,  note  that  g/\  \g\ |  is  the  “unit  vector”  pointing 
along  g.  We  wish  to  choose  g  such  that  the  size  of  the  projection  of  the  signal  s  along  this  unit 
vector  is  maximized.  Clearly,  this  is  accomplished  by  choosing  the  unit  vector  along  the  direction 
of  s.  (A  formal  proof  using  the  Cauchy-Schwarz  inequality  is  provided  in  Problem  5.49.)  That 
is,  we  must  choose  g  to  be  a  scalar  multiple  of  s  (any  scalar  multiple  will  do,  since  SNR  is  a 
scale- invariant  quantity).  In  general,  for  complex- valued  signals  in  complex- valued  white  noise 
(useful  for  modeling  in  complex  baseband),  it  can  be  show  sthat  g  must  be  a  scalar  multiple 
of  s*(t).  When  we  plug  this  in,  the  maximum  SNR  we  obtain  is  2||s||2/Aro-  These  results  are 
important  enough  to  state  formally,  and  we  do  this  below. 
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Theorem  5.9.1  For  linear  processing  of  a  signal  s(t )  corrupted  by  white  noise,  the  output  SNR 
is  maximized  by  correlating  against  s(t).  The  resulting  SNR  is  given  by 


SNR 


max 


(5.107) 


The  expression  (5.106)  for  the  noise  power  at  the  output  of  a  correlator  is  analogous  to  the 
expression  (5.104)  (Example  5.9.1)  for  the  power  of  white  noise  through  a  filter.  This  is  no 
coincidence.  Any  correlation  operation  can  be  implemented  using  a  filter  and  sampler,  as  we 
discuss  next. 


Matched  Filter 

Correlation  with  a  waveform  g{t)  can  be  achieved  using  a  filter  h(t)  =  g{—t)  and  sampling  at 
time  t  —  0.  To  see  this,  note  that 

/co  rco 

y(r)h(-T)dT  =  /  y(r)g(T)dT 
-co  J  —co 

Comparing  with  the  correlator  output  (5.105),  we  see  that  Z  =  2(0).  Now,  applying  Theorem 
5.9.1,  we  see  that  the  SNR  is  maximized  by  choosing  the  filter  impulse  response  as  s*(—t).  As 
we  know,  this  is  called  the  matched  filter  for  s,  and  we  denote  its  impulse  response  as  smf(1)  = 
s*(—t).  We  can  now  restate  Theorem  5.9.1  as  follows. 

Theorem  5.9.2  For  linear  processing  of  a  signal  s(t)  corrupted  by  white  noise,  the  output  SNR 
is  maximized  by  employing  a  matched  filter  with  impulse  response  %f(i)  =  s*(—t),  sampled  at 
time  t  =  0. 


Figure  5.26:  A  signal  passed  through  its  matched  filter  gives  a  peak  at  time  t  —  0.  When  the 
signal  is  delayed  by  t0 ,  the  peak  occurs  at  t  =  to. 


The  statistics  of  the  noise  contribution  to  the  matched  filter  output  do  not  depend  on  the 
sampling  time  (WSS  noise  into  an  LTI  system  yields  a  WSS  random  process),  hence  the  optimum 
sampling  time  is  determined  by  the  peak  of  the  signal  contribution  to  the  matched  filter  output. 
The  signal  contribution  to  the  output  of  the  matched  filter  at  time  t  is  given  by 


z(t) 


J  s(r)sMF(t 


r)dr 


s(t)s*(t  —  t)dr 


This  is  simply  the  correlation  of  the  signal  with  itself  at  delay  t.  Thus,  the  matched  filter  enables 
us  to  implement  an  infinite  bank  of  correlators,  each  corresponding  to  a  version  of  our  signal 
template  at  a  different  delay.  Figure  5.26  shows  a  rectangular  pulse  passed  through  its  matched 
filter.  For  received  signal  y{t)  =  s[t)  +  n(t),  we  have  observed  that  the  optimum  sampling  time 
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(i.e.  the  correlator  choice  maximizing  SNR)  is  t  —  0.  More  generally,  when  the  received  signal  is 
given  by  y{t)  =  s(t  —  to)  +  n(t),  the  peak  of  the  signal  contribution  to  the  matched  filter  shifts 
to  t  —  to,  which  now  becomes  the  optimum  sampling  time. 

While  the  preceding  computations  rely  only  on  second  order  statistics,  once  we  invoke  the  Gaus- 
sianity  of  the  noise,  as  we  do  in  Chapter  6,  we  will  be  able  to  compute  probabilities  (a  preview 
of  such  computations  is  provided  by  Examples  5.8.2  and  5.9.2(f)).  This  will  enable  us  to  develop 
a  framework  for  receiver  design  for  minimizing  the  probability  of  error. 


5.10  Concept  Summary 

We  do  not  summarize  here  the  review  of  probability  and  random  variables,  but  note  that  key 
concepts  relevant  for  communication  systems  modeling  are  conditional  probabilities  and  densities, 
and  associated  results  such  as  law  of  total  probability  and  Bayes’  rule.  As  we  see  in  much  greater 
detail  in  Chapter  6,  conditional  probabilities  and  densities  are  used  for  statistical  characterization 
of  the  received  signal,  given  the  transmitted  signal,  while  Bayes’  rule  can  be  used  to  infer  which 
signal  was  transmitted,  given  the  received  signal. 

Gaussian  random  variables 

•  A  Gaussian  random  variable  X  ~  N(m,v 2)  is  characterized  by  its  mean  m  and  variance  v 2 . 

•  Gaussianity  is  preserved  under  translation  and  scaling.  Particularly  useful  is  the  transformation 
to  a  standard  (X(0, 1))  Gaussian  random  variable:  if  X  ~  N(m,v2),  then  A~m  ~  iV(0, 1).  This 
allows  probabilities  involving  any  Gaussian  random  variable  to  be  expressed  in  terms  of  the  CDF 
<J>(x)  and  CCDF  Q(x)  for  a  standard  Gaussian  random  variable. 

•  Random  variables  Xi, Xn  are  jointly  Gaussian,  or  X  =  (Ad,  ...,Xn)T  is  a  Gaussian  random 
vector,  if  any  linear  combination  a7X  =  a\X\  +  ...  +  anXn  is  a  Gaussian  random  variable. 

•  A  Gaussian  random  vector  X  ~  A(m,  C)  is  completely  characterized  by  its  mean  vector  m 
and  covariance  matrix  C. 

•  Uncorrelated  and  jointly  Gaussian  random  variables  are  independent. 

•  The  joint  density  for  X  ~  iV(m,  C)  exists  if  and  only  if  C  is  invertible. 

•  The  mean  vector  and  covariance  matrix  evolve  separately  under  affine  transformations:  for 
Y  =  AX  +  b,  my  =  Amy  +  b  and  Cy  =  ACy  A7  . 

•  Joint  Gaussianity  is  preserved  under  affine  transformations:  if  X  ~  N{ m,  C)  and  Y  =  AX  +  b, 
then  Y  ~  jV(Am  +  b,  AC  A7 ). 

Random  processes 

•  A  random  process  is  a  generalization  of  the  concept  of  random  vector;  it  is  a  collection  of 
random  variables  on  a  common  probability  space. 

•  While  statistical  characterization  of  a  random  process  requires  specification  of  the  finite¬ 
dimensional  distributions,  coarser  characterization  via  its  second  order  statistics  (the  mean  and 
autocorrelation  functions)  is  often  employed. 

•  A  random  process  X  is  stationary  if  its  statistics  are  shift- invariant;  it  is  WSS  if  its  second 
order  statistics  are  shift- invariant. 

•  A  random  process  is  Gaussian  if  any  collection  of  samples  is  a  Gaussian  random  vector,  or 
equivalently,  if  any  linear  combination  of  any  collection  of  samples  is  a  Gaussian  random  variable. 

•  A  Gaussian  random  process  is  completely  characterized  by  its  mean  and  autocorrelation  (or 
mean  and  autocovariance)  functions. 

•  A  stationary  process  is  WSS.  A  WSS  Gaussian  random  process  is  stationary. 

•  The  autocorrelation  function  and  the  power  spectral  density  form  a  Fourier  transform  pair. 
(This  observation  applies  both  to  time  averages  and  to  ensemble  averages  for  WSS  processes.) 

•  The  most  common  model  for  noise  in  communication  systems  is  WGN.  WGN  n(t)  is  zero  mean, 
WSS,  Gaussian  with  a  flat  PSD  Sn(f)  =  a2  =  4y  -B-  Rn(r )  =  a2S(r).  While  physically  unreal¬ 
izable  (it  has  infinite  power),  it  is  a  useful  mathematical  abstraction  for  modeling  the  flatness 
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of  the  noise  PSD  over  the  band  of  interest.  In  complex  baseband,  noise  is  modeled  as  I  and  Q 
components  which  are  independent  real- valued  WGN  processes. 

•  A  WSS  random  process  X  through  an  LTI  system  with  impulse  response  g(t )  yields  a  WSS 
random  process  Y.  X  and  Y  are  also  jointly  WSS.  We  have  SY(f )  =  Sx(f)\G(f)\2  -H-  Ry{j )  = 
(Rx  *g*gmf)(r). 

•  The  statistics  of  WGN  after  linear  operations  such  as  correlation  and  filtering  are  easy  to 
compute  because  of  its  impulsive  autocorrelation  function. 

•  When  the  received  signal  equals  signal  plus  WGN,  the  SNR  is  maximized  by  matched  filtering 
against  the  signal. 


5.11  Endnotes 

There  are  a  number  of  textbooks  on  probability  and  random  processes  for  engineers  that  can 
be  used  to  supplement  the  brief  communications-centric  exposition  here,  including  Yates  and 
Goodman  [25],  Woods  and  Stark  [26],  Leon-Garcia  [27],  and  Papoulis  and  Pillai  [28]. 

A  more  detailed  treatment  of  the  noise  analysis  for  analog  modulation  provided  in  Appendix  5.E 
can  be  found  in  a  number  of  communication  theory  texts,  with  Zierner  and  Tranter  [4]  providing 
a  sound  exposition. 

As  a  historical  note,  thermal  noise,  which  plays  such  a  crucial  role  in  communications  systems 
design,  was  first  experimentally  characterized  in  1928  by  Johnson  [29].  Johnson  discussed  his 
results  with  Nyquist,  who  quickly  came  up  with  a  theoretical  characterization  [30].  See  [31]  for  a 
modern  re-derivation  of  Nyquist’s  formula,  and  [32]  for  a  discussion  of  noise  in  transistors.  These 
papers  and  the  references  therein  are  good  resources  for  further  exploration  into  the  physical  basis 
for  noise,  which  we  can  only  hint  at  here  in  Appendix  5.C.  Of  course,  as  discussed  in  Section 
5.8,  from  a  communication  systems  designer’s  point  of  view,  it  typically  suffices  to  abstract  away 
from  such  physical  considerations,  using  the  noise  figure  as  a  single  number  summarizing  the 
effect  of  receiver  circuit  noise. 


5.12  Problems 

Conditional  probabilities,  law  of  total  probability,  and  Bayes’  rule 

Problem  5.1  You  are  given  a  pair  of  dice  (each  with  six  sides).  One  is  fair,  the  other  is  unfair. 
The  probability  of  rolling  6  with  the  unfair  die  is  1/2,  while  the  probability  of  rolling  1  through 
5  is  1/10.  You  now  pick  one  of  the  dice  at  random  and  begin  rolling.  Conditioned  on  the  die 
picked,  successive  rolls  are  independent. 

(a)  Conditioned  on  picking  the  unfair  die,  what  is  the  probability  of  the  sum  of  the  numbers  in 
the  first  two  rolls  being  equal  to  10? 

(b)  Conditioned  on  getting  a  sum  of  10  in  your  first  two  throws,  what  is  the  probability  that  you 
picked  the  unfair  die? 


Problem  5.2  A  student  who  studies  for  an  exam  has  a  90%  chance  of  passing.  A  student  who 
does  not  study  for  the  exam  has  a  90%  chance  of  failing.  Suppose  that  70%  of  the  students 
studied  for  the  exam. 

(a)  What  is  the  probability  that  a  student  fails  the  exam? 

(b)  What  is  the  probability  that  a  student  who  fails  studied  for  the  exam? 

(c)  What  is  the  probability  that  a  student  who  fails  did  not  study  for  the  exam? 

(d)  Would  you  expect  the  probabilities  in  (b)  and  (c)  to  add  up  to  one? 
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Problem  5.3  A  receiver  decision  statistic  Y  in  a  communication  system  is  modeled  as  expo¬ 
nential  with  mean  1  if  0  is  sent,  and  as  exponential  with  mean  10  if  1  is  sent.  Assume  that  we 
send  0  with  probability  0.6. 

(a)  Find  the  conditional  probability  that  Y  >  5,  given  that  0  is  sent. 

(b)  Find  the  conditional  probability  that  Y  >  5,  given  that  1  is  sent. 

(c)  Find  the  unconditional  probability  that  Y  >  5. 

(d)  Given  that  Y  >  5,  what  is  the  probability  that  0  is  sent? 

(e)  Given  that  Y  =  5,  what  is  the  probability  that  0  is  sent? 

Problem  5.4  Channel  codes  are  constructed  by  introducing  redundancy  in  a  structured  fashion. 
A  canonical  means  of  doing  this  is  by  introducing  parity  checks.  In  this  problem,  we  see  how 
one  can  make  inferences  based  on  three  bits  61,62,  63  which  satisfy  a  parity  check  equation: 
61  ©  62  ©  63  =  0.  Here  ©  denotes  an  exclusive  or  (XOR)  operation. 

(a)  Suppose  that  we  know  that  P[bi  =  0]  =  0.8  and  P[62  =  1]  =  0.9,  and  model  61  and  62  as 
independent.  Find  the  probability  P[63  =  0]. 

(b)  Define  the  log  likelihood  ratio  (LLRs)  for  a  bit  6  as  LLR(h)  =  log  ~°j .  Setting  L*  = 

LLR(bi),  i  =  1,2,3,  find  an  expression  for  L3  in  terms  of  L3  and  L2,  again  modeling  61  and  62 
as  independent. 

Problem  5.5  A  bit  X  E  {0, 1}  is  repeatedly  transmitted  using  n  independent  uses  of  a  binary 
symmetric  channel  (i.e.,  the  binary  channel  in  Figure  5.2  with  a  =  6)  with  crossover  probability 
a  =  0.1.  The  receiver  uses  a  majority  rule  to  make  a  decision  on  the  transmitted  bit.  Derive 
general  expressions  as  a  function  of  n  (assume  that  n  is  odd,  so  there  are  no  ties  in  the  majority 
rule),  and  substitute  n  =  5  for  numerical  results  and  plots. 

(a)  Let  Z  denote  the  number  of  ones  at  the  channel  output.  (Z  takes  values  0, 1,  Specify 

the  probability  mass  function  of  Z,  conditioned  on  X  =  0. 

(b)  Conditioned  on  X  =  0,  what  is  the  probability  of  deciding  that  one  was  sent  (i.e.,  what  is 
the  probability  of  making  an  error)? 

(c)  Find  the  posterior  probabilities  P[X  =  0| Z  =  m],  m  =  0,1, ...,  rt,  assuming  that  0  or  1  are 
equally  likely  to  be  sent.  Do  a  stem  plot  against  m. 

(d)  Repeat  (c)  assuming  that  the  0  is  sent  with  probability  0.9. 

(e)  As  an  alternative  visualization,  plot  the  LLR  log  pjx=i]z=mj  versus  m  f°r  (c)  and  (d). 
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+3 
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Figure  5.27:  Two-input  four-output  channel  for  Problem  5.6. 


Problem  5.6  Consider  the  two-input,  four-output  channel  with  transition  probabilities  shown 
in  Figure  5.27.  In  your  numerical  computations,  take  p  =  0.05,  q  =  0.1,  r  =  0.3.  Denote  the 
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channel  input  by  X  and  the  channel  output  by  Y. 

(a)  Assume  that  0  and  1  are  equally  likely  to  be  sent.  Find  the  conditional  probability  of  0 

being  sent,  given  each  possible  value  of  the  output.  That  is,  compute  P[X  =  0|Y  =  y]  for  each 
V  ^  3,  —1,  +1,  +3}. 

(b)  Express  the  results  in  (a)  as  log  likelihood  ratios  (LLRs).  That  is,  compute  L(y )  =  log  p[x=i|v=yj 
for  each  y  G  {—3,  —1,  +1,  +3}. 

(c)  Assume  that  a  bit  X,  chosen  equiprobably  from  {0, 1},  is  sent  repeatedly,  using  three  indepen¬ 
dent  uses  of  the  channel.  The  channel  outputs  can  be  represented  as  a  vector  Y  =  (Y1?  Y2,  Y3)T. 
For  channel  outputs  y  =  (+1,  +3,  —  1)T,  End  the  conditional  probabilities  P[Y  —  y|X  =  0]  and 
P[Y  =  y|X  =  l], 

(d)  Use  Bayes’  rule  and  the  result  of  (c)  to  End  the  posterior  probability  P[X  =  0|Y  =  y]  for 
y  =  (+1,  +3,  —  1)T .  Also  compute  the  corresponding  LLR  L( y)  =  log  p[x=ijY=y]'- 

(e)  Would  you  decide  0  or  1  was  sent  when  you  see  the  channel  output  y  =  (+1,  +3,  — l)r? 


Random  variables 

Problem  5.7  Let  X  denote  an  exponential  random  variable  with  mean  10. 

(a)  What  is  the  probability  that  X  is  bigger  than  20? 

(b)  What  is  the  probability  that  X  is  smaller  than  5? 

(c)  Suppose  that  we  know  that  X  is  bigger  than  10.  What  is  the  conditional  probability  that  it 
is  bigger  than  20? 

(d)  Find  E[e~x]. 

(e)  Find  E[X3]. 

Problem  5.8  Let  Ui,...,Un  denote  i.i.d.  random  variables  with  CDF  Fjj(u).  (a)  Let  X  = 
max  (U., ...,  Un ).  Show  that 

P[X  <x]  =  F$(x) 

(b)  Let  Y  =  min  (U\, ...,  Un).  Show  that 

P\Y  <  y]  =  1  -  (1  -  FB(y))n 

(c)  Suppose  that  Ui,...Un  are  uniform  over  [0,1].  Plot  the  CDF  of  X  for  n  —  1,  n  —  5  and 
n  =  10,  and  comment  on  any  trends  that  you  notice. 

(d)  Repeat  (c)  for  the  CDF  of  Y. 

Problem  5.9  True  or  False  The  minimum  of  two  independent  exponential  random  variables 
is  exponential. 

True  or  False:  The  maximum  of  two  independent  exponential  random  variables  is  exponential. 


Problem  5.10  Let  U  and  V  denote  independent  and  identically  distributed  random  variables, 
uniformly  distributed  over  [0, 1]. 

(a)  Find  and  sketch  the  CDF  of  X  =  min (U,  V ). 

Hint:  R  might  be  useful  to  consider  the  complementary  CDF. 

(b)  Find  and  sketch  the  CDF  of  Y  =  V/U .  Make  sure  you  specify  the  range  of  values  taken  by 

Y. 

Hint:  It  is  helpful  to  draw  pictures  in  the  ( u ,  v)  plane  when  evaluating  the  probabilities  of  interest. 

Problem  5.11  (Relation  between  Gaussian  and  exponential)  Suppose  that  X\  and  X2 
are  i.i.d.  Y(0, 1). 
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(a)  Show  that  Z  =  X\  +  X\  is  exponential  with  mean  2. 

(b)  True  or  False:  Z  is  independent  of  0  =  tan-1 

Hint:  Use  the  results  from  Example  5.4.3,  which  tells  us  the  joint  distribution  of  \[Z  and  0. 

Problem  5.12  (The  role  of  the  uniform  random  variable  in  simulations)  Let  U  denote 
a  uniform  random  variable  which  is  uniformly  distributed  over  [[0,1].  (a)  Let  F(x)  denote  an 
arbitrary  CDF  (assume  for  simplicity  that  it  is  continuous).  Defining  X  =  F~l{U),  show  that 
X  has  CDF  F(x). 

Remark:  This  gives  us  a  way  of  generating  random  variables  with  arbitrary  distributions,  assum¬ 
ing  that  we  have  a  random  number  generator  for  uniform  random  variables.  The  method  works 
even  if  X  is  a  discrete  or  mixed  random  variable,  as  long  as  F is  defined  appropriately. 

(b)  Find  a  function  g  such  that  Y  =  g(U)  is  exponential  with  mean  2,  where  U  is  uniform  over 

[0, 1], 

(c)  Use  the  result  in  (b)  and  Matlab’s  rand()  function  to  generate  an  i.i.d.  sequence  of  1000 
exponential  random  variables  with  mean  2.  Plot  the  histogram  and  verify  that  it  has  the  right 
shape. 


Problem  5.13  (Generating  Gaussian  random  variables)  Suppose  that  U \ ,  U2  are  i.i.d. 

and  uniform  over  [0,1]. 

(a)  What  is  the  joint  distribution  of  Z  —  —2  In  U \  and  0  =  2tiU27 

(b)  Show  that  Xi  =  \/~Z  cos©  and  X2  =  \[~Z  sin©  are  i.i.d.  N(  0, 1)  random  variables. 

Hint:  Use  Example  5.4.3  and  Problem  5.11. 

(c)  Use  the  result  of  (b)  to  generate  2000  i.i.d.  1V(0,1)  random  variables  from  2000  i.i.d.  ran¬ 
dom  variables  uniformly  distributed  over  [0, 1],  using  Matlab’s  rand()  function.  Check  that  the 
histogram  has  the  right  shape. 

(d)  Use  simulations  to  estimate  E[X2],  where  X  ~  iV (0, 1) ,  and  compare  with  the  analytical 
result. 

(e)  Use  simulations  to  estimate  P[X3  +  X  >3],  where  X  N{  0,1). 


Problem  5.14  (Generating  discrete  random  variables)  Let  U\, ...,  Un  denote  i.i.d.  random 
variables  uniformly  distributed  over  [0,1]  (e.g.,  generated  by  the  rand()  function  in  Matlab). 
Define,  for  i  =  1, ...,  n, 


1,  Ui  >  0.7 
0,  Ui  <  0.7 


(a)  Sketch  the  CDF  of  Lj . 

(b)  Find  (analytically)  and  plot  the  PMF  of  Z  =  Y\  +  ...  +  Yn,  for  n  =  20. 

(c)  Use  simulation  to  estimate  and  plot  the  histogram  of  Z,  and  compare  against  the  PMF  in 
(b). 

(d)  Estimate  E [Z\  by  simulation  and  compare  against  the  analytical  result. 

(e)  Estimate  E[Z3]  by  simulation. 


Gaussian  random  variables 


Problem  5.15  Two  random  variables  X  and  Y  have  joint  density 


Px,y(x,  y)  = 


Ke 

0 


2  x2  +  y2 


xy  >  0 
xy  <  0 


(a)  Find  K. 

(b)  Show  that  X  and  Y  are  each  Gaussian  random  variables. 
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(c)  Express  the  probability  P[X2  +  X  >  2]  in  terms  of  the  Q  function. 

(d)  Are  X  and  Y  jointly  Gaussian? 

(e)  Are  X  and  Y  independent? 

(f)  Are  X  and  Y  uncorrelated? 

(g)  Find  the  conditional  density  Px\y{x\y).  Is  it  Gaussian? 


Problem  5.16  (computations  involving  joint  Gaussianity)  The  random  vector  X  =  (XiX2)T 
is  Gaussian  with  mean  vector  m  =  (2, 1)T  and  covariance  matrix  C  given  by 


(a)  Let  Yj  =  Ad  +  2X2,  Y2  =  -Xj  +  X2.  Find  cou(Yj,  Yj). 

(b)  Write  down  the  joint  density  of  Yj  and  Yj. 

(c)  Express  the  probability  P[Yj  >  2Yj  +  1]  in  terms  of  the  Q  function. 

Problem  5.17  (computations  involving  joint  Gaussianity)  The  random  vector  X  =  (Ai  A2)T 
is  Gaussian  with  mean  vector  m  =  (—3,  2)T  and  covariance  matrix  C  given  by 


(a)  Let  Yj  =  2A,  -  X2,  Yj  =  -Ad  +  3X2.  Find  cov(Yj ,  Yj). 

(b)  Write  down  the  joint  density  of  Yj  and  Yj. 

(c)  Express  the  probability  P[Yj  >  2Yj  —  1]  in  terms  of  the  Q  function  with  positive  arguments. 

(d)  Express  the  probability  P[Y{  >  3Yj  + 10]  in  terms  of  the  Q  function  with  positive  arguments. 

Problem  5.18  (plotting  the  joint  Gaussian  density)  For  jointly  Gaussian  random  variables 
X  and  Y,  plot  the  density  and  its  contours  as  in  Figure  5.15  for  the  following  parameters: 

(a)  a\  =  1,  <4  =  1,  p  =  0. 

(b)  Ox  =  !>  aY  =  !,  P  =  °-5- 

(c)  ox  =  4>  aY  =  1,  P  =  °-5- 

(d)  Comment  on  the  differences  between  the  plots  in  the  three  cases. 

Problem  5.19  (computations  involving  joint  Gaussianity)  In  each  of  the  three  cases  in 
Problem  5.18, 

(a)  specify  the  distribution  of  X  —  2 Y ; 

(b)  determine  whether  X  —  2 Y  is  independent  of  XI 

Problem  5.20  (computations  involving  joint  Gaussianity)  X  and  Y  are  jointly  Gaussian, 
each  with  variance  one,  and  with  normalized  correlation  —  |.  The  mean  of  X  equals  one,  and 
the  mean  of  Y  equals  two. 

(a)  Write  down  the  covariance  matrix. 

(b)  What  is  the  distribution  of  Z  —  2X  +  3 Y? 

(c)  Express  the  probability  P[Z 2  —  Z  >  6]  in  terms  of  Q  function  with  positive  arguments,  and 
then  evaluate  it  numerically. 

Problem  5.21  (From  Gaussian  to  Rayleigh,  Rician,  and  Exponential  Random  Vari¬ 
ables)  Let  X\,  A"2  be  iid  Gaussian  random  variables,  each  with  mean  zero  and  variance  v2. 
Define  (R,  $)  as  the  polar  representation  of  the  point  (Xj,  Ad),  i.e. , 

Aj  =  R  cos$,  X2  =  R.  sin  T 
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where  R  >  0  and  $  E  [0,  2n\. 

(a)  Find  the  joint  density  of  R  and  $. 

(b)  Observe  from  (a)  that  R ,  $  are  independent.  Show  that  $  is  uniformly  distributed  in  [0,  27t], 
and  find  the  marginal  density  of  R. 

(c)  Find  the  marginal  density  of  R2. 

(d)  What  is  the  probability  that  R2  is  at  least  20  dB  below  its  mean  value?  Does  your  answer 
depend  on  the  value  of  v2l 

Remark:  The  random  variable  R  is  said  to  have  a  Rayleigh  distribution.  Further,  you  should 
recognize  that  R2  has  an  exponential  distribution. 


Random  Processes 

Problem  5.22  Let  X(t)  =  2  sin  (207rf  +  O),  where  O  takes  values  with  equal  probability  in  the 
set  {0,  7t/2,  7t,  37t/2}. 

(a)  Find  the  ensemble-averaged  mean  function  and  autocorrelation  function  of  X. 

(b)  Is  X  WSS? 

(c)  Is  X  stationary? 

(d)  Find  the  time-averaged  mean  and  autocorrelation  function  of  A".  Do  these  depend  on  the 
realization  of  0? 

(e)  Is  X  ergodic  in  mean  and  autocorrelation? 

Problem  5.23  For  each  of  the  following  functions,  sketch  it  and  state  whether  it  can  be  a  valid 
autocorrelation  function.  Give  reasons  for  your  answers. 

(a)  /dr)  =  (1  -  |r|) /[_!,!] (r). 

(b)  /2(t)  =  /i(r-l). 

(c)  /3(r)  =  /i(r)  -  \  (/i(r  -  1)  +  A(r  +  1)). 


Problem  5.24  Consider  the  random  process  Xp(t)  =  Xc{t)  cos2tt fct  —  Xs(t)  sin 2nfct,  where 
XC1  Xs  are  random  processes  defined  on  a  common  probability  space. 

(a)  Find  conditions  on  Xc  and  Xs  such  that  Xp  is  WSS. 

(b)  Specify  the  (ensemble  averaged)  autocorrelation  function  and  PSD  of  Xp  under  the  conditions 
in  (a). 

(c)  Assuming  that  the  conditions  in  (a)  hold,  what  are  the  additional  conditions  for  Xp  to  be  a 
passband  random  process? 

Problem  5.25  Consider  the  square  wave  x{t)  =  l)np(f— n),  where p(t)  =  d-1/2,1/2]  (£)• 

(a)  Find  the  time- averaged  autocorrelation  function  of  x  by  direct  computation  in  the  time  do¬ 
main. 

Hint:  The  autocorrelation  function  of  a  periodic  signal  is  periodic. 

(b)  Find  the  Fourier  series  for  x,  and  use  this  to  find  the  PSD  of  x. 

(c)  Are  the  answers  in  (a)  and  (b)  consistent? 


Problem  5.26  Consider  again  the  square  wave  x{t )  =  ^)^=_0O(— l)np(t  —  n ),  where  p(t)  = 
/[„i/2)i/2](t).  Define  the  random  process  X(t)  =  x(t  —  D ),  where  D  is  a  random  variable  which 
is  uniformly  distributed  over  the  interval  [0, 1]. 

(a)  Find  the  ensemble  averaged  autocorrelation  function  of  X. 

(b)  Is  A  WSS? 

(c)  Is  X  stationary? 

(d)  Is  X  ergodic  in  mean  and  autocorrelation  function? 
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Problem  5.27  Let  n(t)  denote  a  zero  mean  baseband  random  process  with  PSD  Sn(f)  = 
/[-pi]  (/).  Find  and  sketch  the  PSD  of  the  following  random  processes. 

(a)  xi(t)  =  %  (t). 

(b)  x2(t)  =  n(t)-f~d),  for  d  =  i. 

(c)  Find  the  powers  of  aq  and  x2. 


Problem  5.28  Consider  a  WSS  random  process  with  autocorrelation  function  Rx(t)  =  e~a^, 
where  a  >  0. 

(a)  Find  the  output  power  when  X  is  passed  through  an  ideal  LPF  of  bandwidth  W. 

(b)  Find  the  99%  power  containment  bandwidth  of  X.  How  does  it  scale  with  the  parameter  a? 


Channel 


Equalizer 


Estimated 

Message 


Figure  5.28:  Baseband  communication  system  in  Problem  5.29. 


Problem  5.29  Consider  the  baseband  communication  system  depicted  in  Figure  5.28,  where  the 

message  is  modeled  as  a  random  process  with  PSD  Sm(f )  =  2  ^1  —  fi-2,2 ](/)■  Receiver  noise 

is  modeled  as  bandlimited  white  noise  with  two-sided  PSD  Sn(f )  =  |/[-3,3](/).  The  equalizer 
removes  the  signal  distortion  due  to  the  channel. 

(a)  Find  the  signal  power  at  the  channel  input. 

(b)  Find  the  signal  power  at  the  channel  output. 

(c)  Find  the  SNR  at  the  equalizer  input. 

(d)  Find  the  SNR  at  the  equalizer  output. 

Problem  5.30  A  zero  mean  WSS  random  process  X  has  power  spectral  density  Sx{f )  =  (1  — 

|/|)Z[_i,i](/). 

(a)  Find  E[AT(100)X(100.5],  leaving  your  answer  in  as  explicit  a  form  as  you  can. 

(b)  Find  the  output  power  when  X  is  passed  through  a  filter  with  impulse  response  h(t)  =  sinct. 

Problem  5.31  A  signal  s(t)  in  a  communication  system  is  modeled  as  a  zero  mean  random 
process  with  PSD  Ss(f)  =  (1  —  \f\)I[-iti](f).  The  received  signal  is  given  by  y(t)  =  s{t)  +  n(t), 
where  n  is  WGN  with  PSD  Sn(f)  =  0.001.  The  received  signal  is  passed  through  an  ideal  lowpass 
filter  with  transfer  function  H(f)  = 

(a)  Find  the  SNR  (ratio  of  signal  power  to  noise  power)  at  the  filter  input. 

(b)  Is  the  SNR  at  the  filter  output  better  for  B  =  1  or  B  =  Give  a  quantitative  justification 
for  your  answer. 

Problem  5.32  White  noise  n  with  PSD  is  passed  through  an  RC  Liter  with  impulse  response 

h(t)  =  e~t/T°I[0tOO)(t),  where  T0  is  the  RC  time  constant,  to  obtain  the  output  y  =  n*  h. 

(a)  Find  the  autocorrelation  function,  PSD  and  power  of  y. 

(b)  Assuming  now  that  the  noise  is  a  Gaussian  random  process,  find  a  value  of  to  such  that 
y(t o)  —  \y{ 0)  is  independent  of  2/(0),  or  say  why  such  a  t0  cannot  be  found. 
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Problem  5.33  Find  the  noise  power  at  the  output  of  the  filter  for  the  following  two  scenarios: 

(a)  Baseband  white  noise  with  (two-sided)  PSD  ^  is  passed  through  a  filter  with  impulse 
response  h(t)  =  sinc2t. 

(b)  Passband  white  noise  with  (two-sided)  PSD  is  passed  through  a  filter  with  impulse  response 
h{t)  =  sinc2t  cos  1007ri. 

Problem  5.34  Suppose  that  WGN  n(t)  with  PSD  a2  =  ^  =  1  is  passed  through  a  filter  with 
impulse  response  h{t)  =  /[_iii](t)  to  obtain  the  output  y(t)  —  (n  *  h)(t). 

(a)  Find  and  sketch  the  output  power  spectral  density  Sy(f),  carefully  labeling  the  axes. 

(b)  Specify  the  joint  distribution  of  the  three  consecutive  samples  y(l),y(2),y(3). 

(c)  Find  the  probability  that  y(  1)  —  2y{2)  +  y( 3)  exceeds  10. 

Problem  5.35  (computations  involving  deterministic  signal  plus  WGN)  Consider  the 
noisy  received  signal 

y(t)  =  s(t )  +n(t) 

where  s(t)  =  Z[o,3](t)  and  n(t)  is  WGN  with  PSD  cr2  =  N0/2  =  1/4.  The  receiver  computes  the 
following  statistics: 

Yi  =  J  y(t)dt  ,  Y 2  =  y(t)dt 

(a)  Specify  the  joint  distribution  of  Y\  and  Y2. 

(b)  Compute  the  probability  P [Lj  +  Y2  <  2],  expressing  it  in  terms  of  the  Q  function  with  positive 
arguments. 

Problem  5.36  (filtered  WGN)  Let  n(t)  denote  WGN  with  PSD  Sn(f)  =  a2.  We  pass  n(t) 
through  a  Liter  with  impulse  response  h{t)  =  Z[o,i](t)  —  I[i,2](t)  to  obtain  z(t)  —  (n  *  h){t). 

(a)  Find  and  sketch  the  autocorrelation  function  of  z(t). 

(b)  Specify  the  joint  distribution  of  z( 49)  and  ^(50). 

(c)  Specify  the  joint  distribution  of  z( 49)  and  z(52). 

(d)  Evaluate  the  probability  P[2^(50)  >  z( 49)  +  z(51)].  Assume  a2  =  1. 

(e)  Evaluate  the  probability  P[2^(50)  >  z( 49)  +  ^(51)  +  2].  Assume  cr2  =  1. 

Problem  5.37  (filtered  WGN)  Let  n(t)  denote  WGN  with  PSD  Sn(f)  =  a2.  We  pass  n(t) 
through  a  filter  with  impulse  response  h{t)  =  2I[0i2](t)  —  /[ i,2](^)  to  obtain  z{t)  —  (n  *  h)(t). 

(a)  Find  and  sketch  the  autocorrelation  function  of  z(t). 

(b)  Specify  the  joint  distribution  of  z(0),  z(l),  z(2). 

(c)  Compute  the  probability  P[z{ 0)  —  ^(1)  +  z( 2)  >  4]  (assume  cr2  =  1). 

Problem  5.38  (filtered  and  sampled  WGN)  Let  n(t)  denote  WGN  with  PSD  Sn(f)  =  a2. 
We  pass  n{t)  through  a  Liter  with  impulse  response  h{t)  to  obtain  z(t)  —  (n  *  h)(t),  and  then 
sample  it  at  rate  1/TS  to  obtain  the  sequence  z[n]  =  z(nTs),  where  n  takes  integer  values. 

(a)  Show  that 

N  f 

co v(z[n\,  z[m})  =  E[z[n]z*[ra]]  =  --y-  h(t)h*(t  —  [n  —  m)Ts) 

(We  are  interested  in  real-valued  impulse  responses,  but  we  continue  to  develop  a  framework 
general  enough  to  encompass  comp  lex- valued  responses.) 

(b)  For  h{t)  =  7[0i](t),  specify  the  joint  distribution  of  (^[1],  z{2],  ^[3])T  for  a  sampling  rate  of  2 

(Ts  =  i). 

(c)  Repeat  (b)  for  a  sampling  rate  of  1. 

(d)  For  a  general  h  sampled  at  rate  1/TS,  show  that  the  noise  samples  are  independent  if  h(t)  is 
square  root  Nyquist  at  rate  1/TS. 
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Problem  5.39  Consider  the  signal  s(t)  =  I[o,2](t)  —  2/[lj3](f). 

(a)  Find  and  sketch  the  impulse  response  smf(t )  of  the  matched  filter  for  s. 

(b)  Find  and  sketch  the  output  when  s(t)  is  passed  through  its  matched  filter. 

(c)  Suppose  that,  instead  of  the  matched  filter,  all  we  have  available  is  a  filter  with  impulse 
response  h{t)  =  /[0,i](t).  For  an  arbitrary  input  signal  x(t),  show  how  z(t )  =  (x  *  smf)(t )  can  be 
synthesized  from  y(t)  —  (x  *  h)(t). 

Problem  5.40  (Correlation  via  filtering  and  sampling)  A  signal  x(t)  is  passed  through  a 
filter  with  impulse  response  h(t)  =  Iy o,2](t)  to  obtain  an  output  y{t)  —  (x  *  h){t). 

(a)  Find  and  sketch  a  signal  gi(t)  such  that 

y( 2)  =  (x,9i)  =  I  x(t)gi (t)dt 

(b)  Find  and  sketch  a  signal  g2(t)  such  that 

y(  1)  -  2j/(2)  =  (x,g2)  =  j  x{t)g2{t)dt 


Problem  5.41  (Correlation  via  filtering  and  sampling)  Let  us  generalize  the  result  we 
were  hinting  at  in  Problem  5.40.  Suppose  an  arbitrary  signal  x  is  passed  through  an  arbitrary 
filter  h{t)  to  obtain  output  y{t)  —  (x  *  h)(t). 

(a)  Show  that  taking  a  linear  combination  of  samples  at  the  filter  output  is  equivalent  to  a 
correlation  operation  on  u.  That  is,  show  that 

otiViU)  =  (x,g)  = 

i=l 


J  x(t)g(t)dt 


where 

n  n 

g(t)  =  ^2  otih(ti  -t)  =  aiihmf(t  -  U )  (5.108) 

i= 1  i=l 

That  is,  taking  a  linear  combination  of  samples  is  equivalent  to  correlating  against  a  signal  which 
is  a  linear  combination  of  shifted  versions  of  the  matched  filter  for  h. 

(b)  The  preceding  result  can  be  applied  to  approximate  a  correlation  operation  by  taking  linear 
combinations  at  the  output  of  a  filter.  Suppose  that  we  wish  to  perform  a  correlation  against  a 
triangular  pulse  g(t)  =  (1  —  |t|)/[_i5i](t).  How  would  you  approximate  this  operation  by  taking  a 
linear  combination  of  samples  at  the  output  of  a  filter  with  impulse  response  h(t)  =  /r0>i](t). 


Problem  5.42  (Approximating  a  correlator  by  filtering  and  sampling)  Consider  the 
noisy  signal 

y{t)  =  s(t )  +n(t) 

where  s(t)  =  (1  —  |t|)/[_iii](t),  and  n{t)  is  white  noise  with  Sn(f)  =  0.1. 

(a)  Compute  the  SNR  at  the  output  of  the  integrator 


Z  = 


(b)  Can  you  improve  the  SNR  by  modifying  the  integration  in  (a),  while  keeping  the  processing 
linear?  If  so,  say  how.  If  not,  say  why  not. 

(c)  Now,  suppose  that  y(t)  is  passed  through  a  filter  with  impulse  response  h(t)  =  I[o,i](t)  to 
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obtain  z{t)  —  (y  *  h)(t).  If  you  were  to  sample  the  filter  output  at  a  single  time  t  =  £0,  how 
would  you  choose  £o  so  as  to  maximize  the  SNR? 

(d)  In  the  setting  of  (c),  if  you  were  now  allowed  to  take  two  samples  at  times  t\ ,  t2  and  £3  and 
generate  a  linear  combination  a\z(ti)  +  a2z(t2 )  +  03 2: (£3),  how  would  you  choose  {a*},  {£*},  to 
improve  the  SNR  relative  to  (c).  (We  are  looking  for  intuitively  sensible  answers  rather  than  a 
provably  optimal  choice.) 

Hint:  See  Problem  5.41.  Taking  linear  combinations  of  samples  at  the  output  of  a  filter  is 
equivalent  to  correlation  with  an  appropriate  waveform,  which  we  can  choose  to  approximate  the 
optimal  correlator. 


Mathematical  derivations 


Problem  5.43  (Bounds  on  the  Q  function)  We  derive  the  bounds  (5.117)  and  (5.116)  for 


roo  1 

Q(x)  =  /  —=e~t2/2dt 

Jx 


(5.109) 


(a)  Show  that,  for  x  >  0,  the  following  upper  bound  holds: 


Q(x)  <  -e 


-x2/2 


Hint:  Try  pulling  out  a  factor  of  e~x 2/2  from  (5.109),  and  then  bounding  the  resulting  integrand. 
Observe  that  £  >  x  >  0  in  the  integration  interval. 

(b)  For  x  >  0,  derive  the  following  upper  and  lower  bounds  for  the  Q  function: 


(1 


1  -x2/2  e-z2/2 

-7)  7 —  <  Q{x)  <  — 

x2  y/27TX  y/%KX 


Hint:  Write  the  integrand  in  (5.109)  as  a  product  of  l/£  and  te~t2/2  and  then  integrate  by  parts 
to  get  the  upper  bound.  Integrate  by  parts  once  more  using  a  similar  trick  to  get  the  lower 
bound.  Note  that  you  can  keep  integrating  by  parts  to  get  increasingly  refined  upper  and  lower 
bounds. 


Problem  5.44  (Geometric  derivation  of  Q  function  bound)  Let  X\  and  X2  denote  inde¬ 
pendent  standard  Gaussian  random  variables. 

(a)  For  a  >  0,  express  P[|Xi|  >  a,  \X2\  >  a]  in  terms  of  the  Q  function. 

(b)  Find  P[X2  +  X2  >  2a2]. 

Flint:  Transform  to  polar  coordinates.  Or  use  the  results  of  Problem  5.21. 

(c)  Sketch  the  regions  in  the  (xi,x2)  plane  corresponding  to  the  events  considered  in  (a)  and  (b). 

(d)  Use  (a)-(c)  to  obtain  an  alternative  derivation  of  the  bound  Q(x)  <  \e~x  / 2  for  x  >  0  (i.e. , 
the  bound  in  Problem  5.43(a)). 


Problem  5.45  (Cauchy-Schwarz  inequality  for  random  variables)  For  random  variables 
X  and  Y  defined  on  a  common  probability  space,  define  the  mean  squared  error  in  approximating 
X  by  a  multiple  of  Y  as 

J(a )  =  E  [(X  -  aY)2] 

where  a  is  a  scalar.  Assume  that  both  random  variables  are  nontrivial  (i.e.,  neither  of  them  is 
zero  with  probability  one). 

(a)  Show  that 

J(a )  =  E[X2]  +  a2E[F'2]  -  2aE [XY] 
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(b)  Since  J (a)  is  quadratic  in  a,  it  has  a  global  minimum  (corresponding  to  the  best  approxima¬ 
tion  of  X  by  a  multiple  of  Y).  Show  that  this  is  achieved  for  aopt  =  . 

(c)  Show  that  the  mean  squared  error  in  the  best  approximation  found  in  (b)  can  be  written  as 


J(aopt )  =  E[X2] 


(E[Xh])2 

E[Y2] 


(d)  Since  the  approximation  error  is  nonnegative,  conclude  that 

(E[XY'])2  <  E[X2]E[U2]  Cauchy  —  Schwarz  inequality  for  random  variables  (5.110) 
This  is  the  Cauchy-Schwarz  inequality  for  random  variables. 

(e)  Conclude  also  that  equality  is  achieved  in  (5.110)  if  and  only  if  X  and  Y  are  scalar  multiples 
of  each  other. 

Hint:  Equality  corresponds  to  J(aopt )  =  0. 


Problem  5.46  (Normalized  correlation)  (a)  Apply  the  Cauchy-Schwarz  inequality  in  the 
previous  problem  to  “zero  mean”  versions  of  the  random  variables,  Xi  =  X— E[X],  Y1  =  Y  —  E[U] 
to  obtain  that 

|cov(X,  F)|  <  y/var(X)var(F)  (5.111) 

(b)  Conclude  that  the  normalized  correlation  p(X,Y)  defined  in  (5.59)  lies  in  [—1, 1]. 

(c)  Show  that  |p|  =  1  if  and  only  if  we  can  write  X  =  aY  +  b.  Specify  the  constants  a  and  b  in 
terms  of  the  means  and  covariances  associated  with  the  two  random  variables. 


Problem  5.47  (Characteristic  function  of  a  Gaussian  random  vector)  Consider  a  Gaus¬ 
sian  random  vector  X  =  (Ad,  ...,Xm)T  ~  iV(m,  C).  The  characteristic  function  of  X  is  defined 
as  follows: 


0x(w)  =  E 


jj^  \ej(wiXi+...+WmXm 


’] 


(5.112) 


The  characteristic  function  completely  characterizes  the  distribution  of  a  random  vector,  even  if  a 
density  does  not  exist.  If  the  density  does  exist,  the  characteristic  function  is  a  multidimensional 
inverse  Fourier  transform  of  it: 


</>x(w)  =  E 


px(x)  dx 


The  density  is  therefore  given  by  the  corresponding  Fourier  transform 


Px(x) 


1 

(2vr)m 


e  “,wTx0x(w)  dw 


(5.113) 


(a)  Show  that  Y  =  w1  X  is  a  Gaussian  random  variable  with  mean  p  =  w1  m  and  variance 
v2  =  w1  Cw. 

(b)  For  Y  ~  N(p,v2),  show  that 

E[ejy]  =  ej ^ 

(c)  Use  the  result  of  (b)  to  obtain  that  the  characteristic  function  of  X  is  given  by 

0x(w)  =  e-?wTm-!wTCw  (5.114) 


which  depends  only  on  m  and  C. 

(d)  Since  the  distribution  of  X  is  completely  specified  by  its  characteristic  function,  conclude  that 
the  distribution  of  a  Gaussian  random  vector  depends  only  on  its  mean  vector  and  covariance 
matrix.  When  C  is  invertible,  we  can  compute  the  density  (5.58)  by  taking  the  Fourier  transform 
of  the  characteristic  funciton  in  (5.114),  but  we  skip  that  derivation. 
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Problem  5.48  Consider  a  zero  mean  WSS  random  process  X  with  autocorrelation  function 
Rx(t).  Let  Y\  (t)  =  (X  *  hi)(t)  and  Y2(t)  =  (X  *  h2){t)  denote  random  processes  obtained  by 
passing  X  through  LTI  systems  with  impulse  responses  hi  and  h2,  respectively. 

(a)  Find  the  crosscorrelation  function  RYly2(t i,t2). 

Hint:  You  can  use  the  approach  employed  to  obtain  (5.101),  first  finding  Ry,  x  and  then  Ry,  Yo. 

(b)  Are  Fj  and  Y2  jointly  WSS? 

(c)  Suppose  that  X  is  white  noise  with  PSD  Sx{f )  =  1,  hi(t)  =  I[oi](t)  and  h2{t)  =  e_t/[0 oo)(f). 
Find  E[Yi(0)Y2(0)]  and  E[Y1(0)Y2(1)]- 


Problem  5.49  (Cauchy-Schwarz  inequality  for  signals)  Consider  two  signals  (assume  real¬ 
valued  for  simplicity,  although  the  results  we  are  about  to  derive  apply  for  complex-valued  signals 
as  well)  u(t)  and  v{t). 

(a)  We  wish  to  approximate  u(t)  by  a  scalar  multiple  of  v(t)  so  as  to  minimize  the  norm  of  the 
error.  Specifically,  we  wish  to  minimize 


J(a) 


u(t)  —  av{t)  dt  —  ||it  —  av\  —  (u  —  av,u  —  av ) 


Show  that 


J(a)  =  ||w||2  +  a2||w||2  —  2  a(u,v) 


(b)  Show  that  the  quadratic  function  J(a)  is  minimized  by  choosing  a  =  aopt ,  given  by 

(u,v) 


&  opt 


\v\ 


Show  that  the  corresponding  approximation  aoptv  can  be  written  as  a  projection  of  u  along  a 
unit  vector  in  the  direction  of  v: 

v  \  v 

Q'opt'V  \Uj’>  ||  1 1  /  ||  || 

\\v\\  ||v|| 

(c)  Show  that  the  error  due  to  the  optimal  setting  is  given  by 

7"  /  x  1 1  1 1 2  \(u,v}\2 

J  opt)  II  mo 

I  Ml 

(d)  Since  the  minimum  error  is  non-negative,  conclude  that 

1 1«|  1 1  |u|  |  <  |(n,n)|  ,  Cauchy  —  Schwarz  inequality  for  signals  (5.115) 


This  is  the  Cauchy-Schwarz  inequality,  which  applies  to  real-  and  complex-valued  signals  or 
vectors. 

(e)  Conclude  also  that  equality  in  (5.115)  occurs  if  and  only  if  u  is  a  scalar  multiple  of  v  or  if  v 
is  a  scalar  multiple  of  u.  (We  need  to  say  it  both  ways  in  case  one  of  the  signals  is  zero.) 


Problem  5.50  Consider  a  random  process  X  passed  through  a  correlator  g  to  obtain 


Z 


X(t)g(t)dt 


where  X(t),  g(t )  are  real- valued. 

(a)  Show  that  the  mean  and  variance  of  Z  can  be  expressed  in  terms  of  the  mean  function  and 
autovariance  function)  of  X  as  follows: 


E  [Z]  = 


j  mx(t)g(t)dt  =  (mx,g) 
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var  (Z) 


Cx(t  1,  t2)g(ti)g(t2)dt1dt2 


(b)  Suppose  now  that  X  is  zero  mean  and  WSS  with  autocorrelation  Rx(r).  Show  that  the 
variance  of  the  correlator  output  can  be  written  as 

var(Z)  =  J  Rx(t)R9(t)  dr  =  ( Rx,Rg } 

where  Rg(r)  —  (g  *  Qmf)(t)  =  J  g(t)g(t  —  r)  dt  is  the  “autocorrelation”  of  the  waveform  g. 
Hint:  An  alternative  to  doing  this  from  scratch  is  to  use  the  equivalence  of  correlation  and 
matched  filtering.  You  can  then  employ  (5.101),  which  gives  the  output  autocorrelation  function 
when  a  WSS  process  is  sent  through  an  LTI  system,  evaluate  it  at  zero  lag  to  find  the  power, 
and  use  the  symmetry  of  autocorrelation  functions. 

Problems  drawing  on  material  from  Chapter  3  and  Appendix  5.E 

These  can  be  skipped  by  readers  primarily  interested  in  the  digital  communication  material  in  the 
succeeding  chapters. 

Problem  5.51  Consider  a  noisy  FM  signal  of  the  form 

v(t)  =  20  cos(27t  fct  +  cj)s(t ))  +  n[t) 

where  n(t)  is  WGN  with  power  spectral  density  =  10-5,  and  0S(£)  is  the  instantaneous  phase 
deviation  of  the  noiseless  FM  signal.  Assume  that  the  bandwidth  of  the  noiseless  FM  signal  is 
100  KHz. 

(a)  The  noisy  signal  v(t)  is  passed  through  an  ideal  BPF  which  exactly  spans  the  100  KHz 
frequency  band  occupied  by  the  noiseless  signal.  What  is  the  SNR  at  the  output  of  the  BPF? 

(b)  The  output  of  the  BPF  is  passed  through  an  ideal  phase  detector,  followed  by  a  differentiator 
which  is  normalized  to  give  unity  gain  at  10  KHz,  and  an  ideal  (unity  gain)  LPF  of  bandwidth 
10  KHz. 

(i)  Sketch  the  noise  PSD  at  the  output  of  the  differentiator. 

(ii)  Find  the  noise  power  at  the  output  of  the  LPF. 

Problem  5.52  An  FM  signal  of  bandwidth  210  KHz  is  received  at  a  power  of  -90  dBm,  and  is 
corrupted  by  bandpass  AWGN  with  two-sided  PSD  10~22  watts/Hz.  The  message  bandwidth  is 
5  KHz,  and  the  peak-to-average  power  ratio  for  the  message  is  10  dB. 

(a)  What  is  the  SNR  (in  dB)  for  the  received  FM  signal?  (Assume  that  the  noise  is  bandlimited 
to  the  band  occupied  by  the  FM  signal.) 

(b)  Estimate  the  peak  frequency  deviation. 

(c)  The  noisy  FM  signal  is  passed  through  an  ideal  phase  detector.  Estimate  and  sketch  the 
noise  PSD  at  the  output  of  the  phase  detector,  carefully  labeling  the  axes. 

(d)  The  output  of  the  phase  detector  is  passed  through  a  differentiator  with  transfer  function 
H(f)  =  jf,  and  then  an  ideal  lowpass  filter  of  bandwidth  5  kHz.  Estimate  the  SNR  (in  dB)  at 
the  output  of  the  lowpass  filter. 

Problem  5.53  A  message  signal  m(t)  is  modeled  as  a  zero  mean  random  process  with  PSD 

Sm(f)  =  |/|/[— 2,2]( /) 


We  generate  an  SSB  signal  as  follows: 

u[t)  =  20[m(£)  cos2007t£  —  rh{t)  sin  2007rt] 
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where  fh  denotes  the  Hilbert  transform  of  m. 

(a)  Find  the  power  of  m  and  the  power  of  u. 

(b)  The  noisy  received  signal  is  given  by  y(t)  =  u(t)  +n(t),  where  n  is  passband  AWGN  with  PSD 
—  1,  and  is  independent  of  u.  Draw  the  block  diagram  for  an  ideal  synchronous  demodulator 

for  extracting  the  message  m  from  y,  specifying  the  carrier  frequency  as  well  as  the  bandwidth 
of  the  LPF,  and  find  the  SNR  at  the  output  of  the  demodulator. 

(c)  Find  the  signal-to-noise-plus-interference  ratio  if  the  local  carrier  for  the  synchronous  demod¬ 
ulator  has  a  phase  error  of  |. 


5. A  Q  function  bounds  and  asymptotics 

The  following  upper  and  lower  bounds  on  Q(x)  (derived  in  Problem  5.43)  are  asymptotically 
tight  for  large  arguments;  that  is,  the  difference  between  the  bounds  tends  to  zero  as  x  gets 
large. 

Bounds  on  Q(x),  asymptotically  tight  for  large  arguments 

1  \  e~x2t2  e~x 2 12 

1 - -  — -=  <  Q(x)  <  — ,  x  >  0  (5.116) 

t _ x2J  xV2^  xy/2 n  _ ____ 

The  asymptotic  behavior  (5.53)  follows  from  these  bounds.  However,  they  do  not  work  well  for 
small  x  (the  upper  bound  blows  up  to  oo,  and  the  lower  bound  to  —  oo,  as  x  — >  0).  The  following 
upper  bound  is  useful  for  both  small  and  large  values  of  x  >  0:  it  gives  accurate  results  for  small 
x ,  and,  while  it  is  not  as  tight  as  the  bounds  (5.116)  for  large  x,  it  does  give  the  correct  exponent 
of  decay. 

Upper  bound  on  Q(x)  useful  for  both  small  and  large  arguments 

Q(x)  <  ^e_3;2/2  ,  x  >  0  (5.117) 


X 


Figure  5.29:  The  Q  function  and  bounds. 


Figure  5.29  plots  Q{x)  and  its  bounds  for  positive  x.  A  logarithmic  scale  is  used  for  the  values 
of  the  function  in  order  to  demonstrate  the  rapid  decay  with  x.  The  bounds  (5.116)  are  seen  to 
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be  tight  even  at  moderate  values  of  x  (say  x  >  2),  while  the  bound  (5.117)  shows  the  right  rate 
of  decay  for  large  x,  while  also  remaining  useful  for  small  x. 


5.B  Approximations  using  Limit  Theorems 


We  often  deal  with  sums  of  independent  (or  approximately  independent)  random  variables.  Find¬ 
ing  the  exact  distribution  of  such  sums  can  be  cumbersome.  This  is  where  limit  theorems,  which 
characterize  what  happens  to  these  sums  as  the  number  of  terms  gets  large,  come  in  handy. 

Law  of  large  numbers  (LLN):  Suppose  that  Xi,X2, ...  are  i.i.d.  random  variables  with  finite 
mean  m.  Then  their  empirical  average  (X\  +  ...  +  Xn)/n  converges  to  their  statistical  average 
E[Xj]  =  m  as  n  — >  oo.  (Let  us  not  worry  about  exactly  how  convergence  is  defined  for  a  sequence 
of  random  variables.) 

When  we  do  a  simulation  to  estimate  some  quantity  of  interest  by  averaging  over  multiple  runs, 
we  are  relying  on  the  LLN.  The  LLN  also  underlies  all  of  information  theory,  which  is  the  basis 
for  computing  performance  benchmarks  for  coded  communication  systems. 

The  LLN  tells  us  that  the  empirical  average  of  i.i.d.  random  variables  tends  to  the  statistical 
average.  The  central  limit  theorem  characterizes  the  variation  around  the  statistical  average. 

Central  limit  theorem  (CLT):  Suppose  that  Ab,AT2, ...  are  i.i.d.  random  variables  with  finite 
mean  m  and  variance  v2.  Then  the  distribution  of  Yn  =  '»»  tends  to  that  of  a  standard 

v  nvz 

Gaussian  random  variable.  Specifically, 


lim  P 

n— >■  oo 


A, 


nm 


Vnv2 


<  x 


<f)(a:) 


(5.118) 


Notice  that  the  sum  Sn  =  X i  +  ...  +  Xn  has  mean  nm  and  variance  nv2.  Thus,  the  CLT  is  telling 
us  that  Yn  a  normalized,  zero  mean,  unit  variance  version  of  Sn,  has  a  distribution  that  tends  to 
1V(0, 1)  as  n  gets  large.  In  practical  terms,  this  translates  to  using  the  CLT  to  approximate  Sn 
as  a  Gaussian  random  variable  with  mean  nm  and  variance  nv 2,  for  “large  enough”  n.  In  many 
scenarios,  the  CLT  kicks  in  rather  quickly,  and  the  Gaussian  approximation  works  well  for  values 
of  n  as  small  as  6-10. 


Example  5.B.1  (Gaussian  approximation  for  a  binomial  distribution)  Consider  a  bino- 


Figure  5.30:  A  binomial  pmf  with  parameters  n  =  20  and  p  =  0.3,  and  its  A^(6,4.2)  Gaussian 
approximation. 
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mial  random  variable  with  parameters  n  and  p.  We  know  that  we  can  write  it  as  Sn  =  X\+ ...+Xn, 
where  Xt  are  i.i.d.  Bernoulli(p).  Note  that  E[W]  =  p  and  var(Xj)  =  p(l—p),  so  that  Sn  has  mean 
np  and  variance  np(  1  —p).  We  can  therefore  approximate  Binomial(n,p )  by  N(np,np(  1  —  p)) 
according  to  the  CLT.  The  CLT  tells  us  that  we  can  approximate  the  CDF  of  a  binomial  by  a 
Gaussian:  thus,  the  integral  of  the  Gaussian  density  from  (—00,  k]  should  approximate  the  sum 
of  the  binomial  pmf  from  0  to  k.  The  plot  in  Figure  5.30  shows  that  the  Gaussian  density  itself 
(with  mean  np  =  6  and  variance  np(  1  —  p)  —  4.2)  approximates  the  binomial  pmf  quite  well 
around  the  mean,  so  that  we  do  expect  the  corresponding  CDFs  to  be  close. 


5.C  Noise  Mechanisms 


We  have  discussed  mathematical  models  for  noise.  We  provide  here  some  motivation  and  physical 
feel  for  how  noise  arises. 

Thermal  Noise:  Even  in  a  resistor  that  has  no  external  voltage  applied  across  it,  the  charge 
carriers  exhibit  random  motion  because  of  thermal  agitation,  just  as  the  molecules  in  a  gas  do. 
The  amount  of  motion  depends  on  the  temperature,  and  results  in  thermal  noise.  Since  the 
charge  carriers  are  equally  likely  to  move  in  either  direction,  the  voltages  and  currents  associated 
with  thermal  noise  have  zero  DC  value.  We  therefore  quantify  the  noise  power,  or  the  average 
squared  values  of  voltages  and  currents  associated  with  the  noise.  These  were  first  measured  by 
Johnson,  and  then  explained  by  Nyquist  based  on  statistical  thermodynamics  arguments,  in  the 
1920s.  As  a  result,  thermal  noise  is  often  called  Johnson  noise,  or  Johnson-Nyquist  noise. 

Using  arguments  that  we  shall  not  go  into,  Nyquist  concluded  that  the  mean  squared  value  of 
the  voltage  associated  with  a  resistor  R,  measured  in  a  small  frequency  band  [/,  /  +  A/],  is  given 
by 

v*(f,Af)  =  4RkTAf  (5.119) 

where  R  is  the  resistance  in  ohms,  k  =  1.38  x  10~23  Joules/Kelvin  is  Boltzmann’s  constant,  and  T 
is  the  temperature  in  degrees  Kelvin  ( T Kelvin  =  Tcentigrade  +  273).  Notice  that  the  mean  squared 
voltage  depends  only  on  the  width  of  the  frequency  band,  not  its  location;  that  is,  thermal  noise 
is  white.  Actually,  a  more  accurate  statistical  mechanics  argument  does  reveal  a  dependence  on 
frequency,  as  follows: 

ARhfAf 

M.  1 

ekr  —  I 

where  h  =  6.63  x  10~34  Joules/Hz  denotes  Planck’s  constant,  which  relates  the  energy  of  a  photon 
to  the  frequency  of  the  corresponding  electromagnetic  wave  (readers  may  recall  the  famous 
formula  E  =  hv ,  where  v  is  the  frequency  of  the  photon).  Now,  ex  ~  1  +  x  for  small  x.  Using 
this  in  (5.120),  we  obtain  that  it  reduces  to  (5.119)  for  ^  1  or  /  <C  kTh  =  /*.  For  T  =  290K, 

we  have  /*  ~  6  x  1012  Hz,  or  6  THz.  The  practical  operating  range  of  communication  frequencies 
today  is  much  less  than  this  (existing  and  emerging  systems  operate  well  below  100  GHz),  so 
that  thermal  noise  is  indeed  very  well  modeled  as  white  for  current  practice. 

For  bandwidth  B,  (5.119)  yields  the  mean  squared  voltage 


vn(fi  Af) 


v\  =  4  RkT  B 


Now,  if  we  connect  the  noise  source  to  a  matched  load  of  impedance  R,  the  mean  squared  power 
delivered  to  the  load  is 


PI  =  ('Vr'^2  =  kTB 


(5.121) 
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The  preceding  calculation  provides  a  valuable  benchmark,  giving  the  communication  link  designer 
a  ballpark  estimate  of  how  much  noise  power  to  expect  in  a  receiver  operating  over  a  bandwidth  B. 
Of  course,  the  noise  for  a  particular  receiver  is  typically  higher  than  this  benchmark,  and  must  be 
calculated  based  on  detailed  modeling  and  simulation  of  internal  and  external  noise  sources,  and 
the  gains,  input  impedances,  and  output  impedances  for  various  circuit  components.  However, 
while  the  circuit  designer  must  worry  about  these  details,  once  the  design  is  complete,  he  or  she 
can  supply  the  link  designer  with  a  single  number  for  the  noise  power  at  the  receiver  output, 
referred  to  the  benchmark  (5.121). 

Shot  noise:  Shot  noise  occurs  because  of  the  discrete  nature  of  the  charge  carriers.  When  a 
voltage  applied  across  a  device  causes  current  to  flow,  if  we  could  count  the  number  of  charge 
carriers  going  from  one  point  in  the  device  to  the  other  (e.g.,  from  the  source  to  the  drain  of 
a  transistor)  over  a  time  period  r,  we  would  see  a  random  number  N(t),  which  would  vary 
independently  across  disjoint  time  periods.  Under  rather  general  assumptions,  N(t)  is  well 
modeled  as  a  Poisson  random  variable  with  mean  At,  where  A  scales  with  the  DC  current.  The 
variance  of  a  Poisson  random  variable  equals  its  mean,  so  that  the  variance  of  the  rate  of  charge 
carrier  flow  equals 

.1V(t).  1  .  .  ..  A 

var( — — )  =  — ^var(iV(r))  =  - 

T  T1  T 

We  can  think  of  this  as  the  power  of  the  shot  noise.  Thus,  increasing  the  observation  interval 
r  smooths  out  the  variations  in  charge  carrier  flow,  and  reduces  the  shot  noise  power.  If  we 
now  think  of  the  device  being  operated  over  a  bandwidth  R,  we  know  that  we  are  effectively 
observing  the  device  at  a  temporal  resolution  r  ~  -A.  Thus,  shot  noise  power  scales  linearly  with 
B. 

The  preceding  discussion  indicates  that  both  thermal  noise  and  shot  noise  are  white,  in  that 
their  power  scales  linearly  with  the  system  bandwidth  B,  independent  of  the  frequency  band  of 
operation.  We  can  therefore  model  the  aggregate  system  noise  due  to  these  two  phenomena  as  a 
single  white  noise  process.  Indeed,  both  phenomena  involve  random  motions  of  a  large  number 
of  charge  carriers,  and  can  be  analyzed  together  in  a  statistical  mechanics  framework.  This  is 
well  beyond  our  scope,  but  for  our  purpose,  we  can  simply  model  the  aggregate  system  noise  due 
to  these  phenomena  as  a  single  white  noise  process. 

Flicker  noise:  Another  commonly  encountered  form  of  noise  is  1  //  noise,  also  called  flicker 
noise,  whose  power  increases  as  the  frequency  of  operation  gets  smaller.  The  sources  of  1  // 
noise  are  poorly  understood,  and  white  noise  dominates  in  the  typical  operating  regimes  for 
communication  receivers.  For  example,  in  an  RF  system,  the  noise  in  the  front  end  (antenna, 
low  noise  amplifier,  mixer)  dominates  the  overall  system  noise,  and  1/f  noise  is  negligible  at 
these  frequencies.  We  therefore  ignore  1/f  noise  in  our  noise  modeling. 


5.D  The  structure  of  passband  random  processes 


We  discuss  here  the  modeling  of  passband  random  processes,  and  in  particular,  passband  white 
noise,  in  more  detail.  These  insights  are  useful  for  the  analysis  of  the  effect  of  noise  in  analog 
communication  systems,  as  in  Appendix  5.E. 

We  can  define  the  complex  envelope,  and  I  and  Q  components,  for  a  passband  random  process 
in  exactly  the  same  fashion  as  is  done  for  deterministic  signals  in  Chapter  2.  For  a  passband 
random  process,  each  sample  path  (observed  over  a  large  enough  time  window)  has  a  Fourier 
transform  restricted  to  passband.  We  can  therefore  define  complex  envelope,  I/Q  components 
and  envelope/phase  as  we  do  for  deterministic  signals.  For  any  given  reference  frequency  fc  in 
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the  band  of  interest,  any  sample  path  xp(t)  for  a  passband  random  process  can  be  written  as 


xp{t)  =  Re  (x(t)ej2lT^ct) 

xp{t)  =  xc(t)  cos27r fct  —  xs(t)  sin27r/cf 

xp(t)  =  e(t)  cos  ( 2nfct  +  0(t)) 

where  x(t)  =  xc(t)  +  jxs(t)  =  e(t)e^e^  is  the  complex  envelope,  xc(t),  xs(t)  are  the  1  and  Q 
components,  respectively,  and  e(t),  9{t)  are  the  envelope  and  phase,  respectively. 

PSD  of  complex  envelope:  Applying  the  standard  frequency  domain  relationship  to  the 
time-windowed  sample  paths,  we  have  the  frequency  domain  relationship 

=  \xT.(f  -  }c)  +  \x’T„(~f  ~  h) 


We  therefore 

I xp,T.U)\2  =  \\XtX!  -  fc)f  +  jl -  fc) I2  =  \\XtX!  -  fo)\2  +  i| xT.(-f  -  fcW 
Dividing  by  T0  and  letting  Ta  — >  oo,  we  obtain 

s„(f)  =  \s,(J  -  fc)  +  \sT(-f  -  fc)  (5.122) 

where  Sx(f)  is  baseband.  Using  (5.87),  the  one-sided  passband  PSD  is  given  by 

(/)  =  \s,U  -  fc)  (5.123) 

Similarly,  we  can  go  from  passband  to  complex  baseband  using  the  formula 

SM=  25J(/  +  /c)  (5.124) 

What  about  the  1  and  Q  components?  Consider  the  complex  envelope  x(t)  =  xc{t)  +  jxs(t).  Its 
autocorrelation  function  is  given  by 


Rx(t)  =  x(t)x*(t  -  r)  =  (xc(t)  +jxs(t ))  (xc(t  -  t )  -  jxs{t  -  r)) 


which  yields 


Rx(r) 


{RxAt)  +  RXs{r))+j  (Rxs,zc(r)  -  RXc,xM)) 
(Rxc(r)  +  Rxs(t))  +  j  (Rxs,xc(r)  -  Rx„Xc(-t)) 


Taking  the  Fourier  transform,  we  obtain 


(5.125) 


SAf) 


s„(f )  +  ST,V)  +  3  tw/)  -  5:.,x.(/)) 


which  simplifies  to 


sx(f)  =  SXe(f)  +  Sx.(f)  -  21m  (SXs,Xc(f))  (5.126) 

For  simplicity,  we  henceforth  consider  situations  in  which  SXsjXc(f)  =  0  (i.e.,  the  I  and  Q  com¬ 
ponents  are  uncorrelated).  Actually,  for  a  given  passband  random  process,  even  if  the  I  and 
Q  components  for  a  given  frequency  reference  are  uncorrelated,  we  can  make  them  correlated 
by  shifting  the  frequency  reference.  However,  such  subtleties  are  not  required  for  our  purpose, 
which  is  to  model  digitally  modulated  signals  and  receiver  noise. 
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5.D.1  Baseband  representation  of  passband  white  noise 

Consider  passband  white  noise  as  shown  in  Figure  5.21.  If  we  choose  the  reference  frequency  as 
the  center  of  the  band,  then  we  get  a  simple  model  for  the  complex  envelope  and  the  I  and  Q 
components  of  the  noise,  as  depicted  in  Figure  5.31.  The  complex  envelope  has  PSD 

Sn(f)  =  2N0,  |/|  <5/2 

and  the  I  and  Q  components  have  PSDs  and  cross-spectrum  given  by 

Snc{f)  =  Sns(f)  =  N0  ,  |/|  <  S/2 

s*„»„(/)  =  0 


PSD  of  I  and  Q  components 


Sn(f)  =Sn  (f) 

1JC  s 


No 

-B/2 

B/2 

PSD  of  complex  envelope 


Figure  5.31:  PSD  of  I  and  Q  components,  and  complex  envelope,  of  passband  white  noise. 


Note  that  the  power  of  the  complex  envelope  is  2 A/D,  which  is  twice  the  power  of  the  correspond¬ 
ing  passband  noise  np.  This  is  consistent  with  the  convention  in  Chapter  2  for  deterministic, 
finite-energy  signals,  where  the  complex  envelope  has  twice  the  energy  of  the  corresponding  pass- 
band  signal.  Later,  when  we  discuss  digital  communication  receivers  and  their  performance  in 
Chapter  6,  we  find  it  convenient  to  scale  signals  and  noise  in  complex  baseband  such  that  we  get 
rid  of  this  factor  of  two.  In  this  case,  we  obtain  that  the  PSD  of  the  I  and  Q  components  PSDs 
are  given  by  SUc(f)  =  Sn„(f)  =  N0/2. 


Passband  White  Noise  is  Circularly  Symmetric 


np  (t) 


Lowpass 

Filter 


2cos(27tfc  t-0  ) 


nb(t) 


Figure  5.32:  Circular  symmetry  implies  that  the  PSD  of  the  baseband  noise  n&(i)  is  independent 
of  9. 


An  important  property  of  passband  white  noise  is  its  circular  symmetry:  the  statistics  of  the  I 
and  Q  components  are  unchanged  if  we  change  the  phase  reference.  To  understand  what  this 
means  in  practical  terms,  consider  the  downconversion  operation  shown  in  Figure  5.32,  which 
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yields  a  baseband  random  process  ribit).  Circular  symmetry  corresponds  to  the  assumption  that 
the  PSD  of  rib  does  not  depend  on  9.  Thus,  it  immediately  implies  that 

Snc(f)  =  Sns(f )  o  Rnc(r )  =  Rns(r)  (5.127) 

since  rib  =  nc  for  9  =  0,  and  rib  =  ns.  for  9  =  —  |,  where  nc,ns  are  the  I  and  Q  components, 
respectively,  taking  fc  =  /0  as  a  reference.  Thus,  changes  in  phase  reference  do  not  change  the 
statistics  of  the  1  and  Q  components. 


5.E  SNR  Computations  for  Analog  Modulation 

We  now  compute  SNR  for  the  amplitude  and  angle  modulation  schemes  discussed  in  Chapter  3. 
Since  the  format  of  the  messages  is  not  restricted  in  our  analysis,  the  SNR  computations  apply 
to  digital  modulation  (where  the  messages  are  analog  waveforms  associated  with  a  particular 
sequence  of  bits  being  transmitted)  as  well  as  analog  modulation  (where  the  messages  are  typically 
“natural”  audio  or  video  waveforms  beyond  our  control).  However,  such  SNR  computations  are 
primarily  of  interest  for  analog  modulation,  since  the  performance  measure  of  interest  for  digital 
communication  systems  is  typically  probability  of  error. 


5.E.1  Noise  Model  and  SNR  Benchmark 

For  noise  modeling,  we  consider  passband,  circularly  symmetric,  white  noise  np{t)  in  a  system 
of  bandwidth  B  centered  around  /c,  with  PSD  as  shown  in  Figure  5.21.  As  discussed  in  Section 
5.D.1,  we  can  write  this  in  terms  of  its  1  and  Q  components  with  respect  to  reference  frequency 
fc  as 

np(t )  =  nc(t )  cos27r/cf  —  ns(t )  sin27r/cf 
where  the  relevant  PSDs  are  given  in  Figure  5.31. 

Baseband  benchmark:  When  evaluating  the  SNR  for  various  passband  analog  modulation 
schemes,  it  is  useful  to  consider  a  hypothetical  baseband  system  as  benchmark.  Suppose  that 
a  real-valued  message  of  bandwidth  Bm  is  sent  over  a  baseband  channel.  The  noise  power  over 
the  baseband  channel  is  given  by  Pn  =  N0Bm.  If  the  received  signal  power  is  Pr  =  Pm,  then  the 
SNR  benchmark  for  this  baseband  channel  is  given  by: 

SNR„  =  -fj-  (5.128) 

JVo  -Dm 


5.E.2  SNR  for  Amplitude  Modulation 

We  now  quickly  sketch  SNR  computations  for  some  of  the  variants  of  AM.  The  signal  and  power 
computations  are  similar  to  earlier  examples  in  this  chapter,  so  we  do  not  belabor  the  details. 

SNR  for  DSB-SC:  For  message  bandwidth  Bm,  the  bandwidth  of  the  passband  received  signal 
is  B  =  2 Bm.  The  received  signal  given  by 

yp(t)  =  Acm{t)  cos(27r/ct  +  9r)  +  np{t) 

where  9r  is  the  phase  offset  between  the  incoming  carrier  and  the  LO.  The  received  signal  power 
is  given  by 

Pr  =  ( Acm(t )  cos(2vr/ct  +  6r ))2  =  A2cPm/ 2 
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A  coherent  demodulator  extracts  the  I  component,  which  is  given  by 


yc(t)  =  Acm(t )  cos  6b  +  nc(t) 


The  signal  power  is 

Ps  =  ( Acm(t )  cos  0r)2  =  A2cPm  cos2  6b 

while  the  noise  power  is 

Pn  =  n^F)  =  N0B  =  2  N0Bm 

so  that  the  SNR  is 


SNRDSb  — 


A2Pm  cos2  6r 
2N0Bm 


/  *'  cos2  0r  =  SNRb  cos2  9r 
N0Bm 


(5.129) 


which  is  the  same  as  the  baseband  benchmark  (5.128)  For  ideal  coherent  demodulation  (i.e., 
9r  =  0),  we  obtain  that  the  SNR  for  DSB  equals  the  baseband  benchmark  SNRb  in  (5.128). 

SNR  for  SSB:  For  message  bandwidth  Bmi  the  bandwidth  of  the  passband  received  signal  is 
B  =  Bm.  The  received  signal  given  by 


yp{t)  =  Acm(t )  cos(27t fct  +  9r)  ±  Acm(t )  sm(2n fct  +  9r )  +  np(t) 

where  9r  is  the  phase  offset  between  the  incoming  carrier  and  the  LO.  The  received  signal  power 
is  given  by 


Pr  =  ( Acm(t )  cos(27t fct  +  9r))2  +  ( Acm(t )  sin(27r/ct  +  9r ))2  =  A2Pm 
A  coherent  demodulator  extracts  the  1  component,  which  is  given  by 

yc{t)  =  Acm(t)  cos  9r  =F  Acm(t )  sin  9r  +  nc{t) 


The  signal  power  is 

Ps  =  ( Acm{t )  cos  6b)2  =  A2Pm  cos2  9r 
while  the  noise  plus  interference  power  is 


Pn  =  n2c{t)  +  ( Acfh{t )  sin  9r)2  =  N0B  +  A2Pm  sin2  9r 


Nq  Bm  +  A2Pm  sin2  9r 


so  that  the  signal-to-interference-plus-noise  (SINR)  is 


nrjUD  _  P,n  cos2  9,  _  Pr  cos2  6r 

—  NoBm+AZPmsin2  er  ~  N0Bm+Pr  sin2  er 
SNRb  cos2  9r 
1 +SNRb  sin2  8r 


(5.130) 


This  coincides  with  the  baseband  benchmark  (5.128)  for  ideal  coherent  demodulation  (i.e.,  9r  = 
0).  However,  for  9r  ^  0,  even  when  the  received  signal  power  Pr  gets  arbitrarily  large  relative  to 
the  noise  power,  the  SINR  cannot  be  larger  than  bg  ,  which  shows  the  importance  of  making 
the  phase  error  as  small  as  possible. 

SNR  for  AM:  Now,  consider  conventional  AM.  While  we  would  typically  use  envelope  detection 
rather  than  coherent  demodulation  in  this  setting,  it  is  instructive  to  compute  SNR  for  both 
methods  of  demodulation.  For  message  bandwidth  Bm,  the  bandwidth  of  the  passband  received 
signal  is  B  =  2 Bm.  The  received  signal  given  by 


yp(t)  =  Ac  (1  +  amodmn{t ))  cos(27r fct  +  6b)  +  np{t)  (5.131) 
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where  mn(t )  is  the  normalized  version  of  the  message  (with  min tmn(t)  =  —  1),  and  where  9r  is 
the  phase  offset  between  the  incoming  carrier  and  the  LO.  The  received  signal  power  is  given  by 


Pr  =  ( Acm(t )  cos(27r/ct  +  9r))2  =  A2(  1  +  a2modPmn)/ 2  (5.132) 


where  Pmn  =  m^(t)  is  the  power  of  the  normalized  message.  A  coherent  demodulator  extracts 
the  I  component,  which  is  given  by 

yc(t)  =  Ac  +  Acamodmn(t)  cos  9r  +  nc(t) 

The  power  of  the  information-bearing  part  of  the  signal  (the  DC  term  due  to  the  carrier  carries 
no  information,  and  is  typically  rejected  using  AC  coupling)  is  given  by 


Ps  =  (. Acamodmn(t )  cos  9rf  =  A2ca2modPmri  cos2  9r  (5.133) 


Recall  that  the  AM  power  efficiency  is  defined  as  the  power  of  the  message-bearing  part  of  the 
signal  to  the  power  of  the  overall  signal  (which  includes  an  unmodulated  carrier),  and  is  given 
by 


Vam  = 


_ mod 1  mn 

1  A  amod^Dmn 


We  can  therefore  write  the  signal  power  (5.133)  at  the  output  of  the  coherent  demodulator  in 
terms  of  the  received  power  in  (5.132)  as: 


Ps  =  2Prr]AM  cos2  9r 


while  the  noise  power  is 

Pn  =  n?c{t)  =  N0B  =  2N0Bm 

Thus,  the  SNR  is 


SNRAM,coh  =  f  =  2Prri2ANMJ°f9r  =  S N Rbr]AM  cos2  9r  (5.134) 

Thus,  even  with  ideal  coherent  demodulation  (9r  =  0),  the  SNR  obtained  is  AM  is  less  than  that 
of  the  baseband  benchmark,  since  tjam  <  1  (typically  much  smaller  than  one).  Of  course,  the 
reason  we  incur  this  power  inefficiency  is  to  simplify  the  receiver,  by  message  recovery  using  an 
envelope  detector.  Let  us  now  compute  the  SNR  for  the  latter. 


Figure  5.33:  At  high  SNR,  the  envelope  of  an  AM  signal  is  approximately  equal  to  its  I  component 
relative  to  the  received  carrier  phase  reference. 


Expressing  the  passband  noise  in  the  received  signal  (5.131)  with  the  incoming  carrier  as  the 
reference,  we  have 

yp(t)  =  Ac  (1  +  amodmn(t ))  cos(27t fct  +  9r)  +  nc(t)  cos(27r fct  +  9r )  -  ns(t)  sin(27r  fct  +  9r ) 
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where,  by  virtue  of  circular  symmetry,  nc,ns  have  the  PSDs  and  cross-spectra  as  in  Figure  5.31, 
regardless  of  9r.  That  is, 

yP(t)  =  yc(t)  cos(2vr/ct  +  9r)  -  ys(t)  sm(2n  fct  +  9r ) 

where,  as  shown  in  Figure  5.33, 

yc[t)  =  Ac(  1  +  amodmn{t))  +  nc(t)  ,  y3(t)  =  ns(t ) 

At  high  SNR,  the  signal  term  is  dominant,  so  that  yc{t)  3>  ys(t)-  Furthermore,  since  the  AM 
signal  is  positive  (assuming  amod  <  1),  so  that  yc>  0  “most  of  the  time,”  even  though  nc  can  be 
negative.  We  therefore  obtain  that 

e(t)  =  ^y2(t)  +  y2(t)  »  \yc(t)  \  «  yc(t) 

That  is,  the  output  of  the  envelope  detector  is  approximated,  for  high  SNR,  as 

e(t)  ~  Ac  (1  +  amodmn(t ))  +  nc(t) 

The  right-hand  side  is  what  we  would  get  from  ideal  coherent  detection.  We  can  reuse  our  SNR 
computation  for  coherent  detection  to  conclude  that  the  SNR  at  the  envelope  detector  output  is 
given  by 

SNRAM,envdet  =  SNRbr]AM  (5.135) 

Thus,  for  a  properly  designed  ( amod  <  1)  AM  system  operating  at  high  SNR,  the  envelope 
detector  approximates  the  performance  of  ideal  coherent  detection,  without  requiring  carrier 
synchronization. 


5.E.3  SNR  for  Angle  Modulation 

We  have  seen  how  to  compute  SNR  when  white  noise  adds  to  a  message  encoded  in  the  signal 
amplitude.  Let  us  now  see  what  happens  when  the  message  is  encoded  in  the  signal  phase  or 
frequency.  The  received  signal  is  given  by 


yv(t)  =  Ac  cos(27r/cf  +  9{t))  +  np{t)  (5.136) 

where  np(t)  is  passband  white  noise  with  one-sided  PSD  Nq  over  the  signal  band  of  interest,  and 
where  the  message  is  encoded  in  the  phase  9(t).  For  example, 


9(t)  =  kpm(t ) 


for  phase  modulation,  and 


1  d 
2n  dt 


9(t)  =  kfm(t ) 


for  frequency  modulation.  We  wish  to  understand  how  the  additive  noise  np(t)  perturbs  the 
phase. 

Decomposing  the  passband  noise  into  1  and  Q  components  with  respect  to  the  phase  of  the 
noiseless  angle  modulated  signal,  we  can  rewrite  the  received  signal  as  follows: 


yp(t)  =  Ac  cos(2ir fct  +  9(t))  +  nc(t)  cos(27t  fct  +  9(t))  -  ns(t )  sm(2nfct  +  9(t)) 
=  (Ac  +  nc(t))  cos(27t  fct  +  9(t ))  -  ns(t)  sm(2nfct  +  9(t)) 


(5.137) 
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Figure  5.34:  I  and  Q  components  of  a  noisy  angle  modulated  signal  with  the  phase  reference 
chosen  as  the  phase  of  the  noiseless  signal. 


where  nc,  ns  have  PSDs  as  in  Figure  5.31  (with  cross-spectrum  SUsjnc(f)  =  0),  thanks  to  circular 
symmetry  (we  assume  that  it  applies  approximately  even  though  the  phase  reference  9(t)  is  time- 
varying).  The  I  and  Q  components  with  respect  to  this  phase  reference  are  shown  in  Figure  5.34, 
so  that  the  corresponding  complex  envelope  can  be  written  as 

y(t)  =  e(t)ej0n ® 


where 


and 


e(t)  =  \J (Ac  +  nc(t))2  +  n2s{t) 
9n(t)  =  tan"1  Us{t) 


Ac  +  nc(t) 

The  passband  signal  in  (5.137)  can  now  be  rewritten  as 


(5.138) 

(5.139) 


yp(t)  =  R e(y(t)e2nfct+m)  =  Re  (e(t)ej6n{t)e27vfct+m)  =  e(t)  cos  (2vr fct  +  9(t)  +  0n(t )) 

At  high  SNR,  Ac  \nc\  and  Ac^>  |ns|.  Thus, 

I « 1 

Ac  +  nc(t) 

and 

ns(t)  ~  ns(t ) 

Ac  +  nc(t)  Ac 

For  |x|  small,  tanx  ~  x,  and  hence  x  ~  tan-1  x.  We  therefore  obtain  the  following  high  SNR 
approximation  for  the  phase  perturbation  due  to  the  noise: 


9n(t)  =  tan 


-l 


ns(t)  ns(t ) 


Ac  +  nc(t)  Ac 
To  summarize,  we  can  model  the  received  signal  (5.136)  as 

ns(t) 


high  SNR  approximation 


(5.140) 


yp(t)  w  Ac  cos(2n  fct  +  9{t)  + 


Ar 


high  SNR  approximation 


(5.141) 


Thus,  the  Q  component  (relative  to  the  desired  signal’s  phase  reference)  of  the  passband  white 
noise  appears  as  phase  noise,  but  is  scaled  down  by  the  signal  amplitude. 
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FM  Noise  Analysis 

Let  us  apply  the  preceding  to  develop  an  analysis  of  the  effects  of  white  noise  on  FM.  It  is 
helpful,  but  not  essential,  to  have  read  Chapter  3  for  this  discussion.  Suppose  that  we  have  an 
ideal  detector  for  the  phase  of  the  noisy  signal  in  (5.141),  and  that  we  differentiate  it  to  recover 
a  message  encoded  in  the  frequency.  (For  those  who  have  read  Chapter  3,  we  are  talking  about 
an  ideal  limiter-discriminator).  The  output  is  the  instantaneous  frequency  deviation,  given  by 

1  d  ?7/  ( f  ^ 

z(t)  =  —Jt  m  +  8n(t))  «  kfm(t)  +  (5.142) 

using  the  high  SNR  approximation  (5.140). 


Figure  5.35:  Block  diagram  for  FM  system  using  limiter-discriminator  demodulation. 


PSD  of  noiseless  FM  signal 


Before  limiter-discriminator 


Bgl 

^/^2no 

brf 


After  limiter-discriminator 


Figure  5.36:  PSDs  of  signal  and  noise  before  and  after  limiter-discriminator. 


We  now  analyze  the  performance  of  an  FM  system  whose  block  diagram  is  shown  in  Figure 
5.35.  For  wideband  FM,  the  bandwidth  Brf  of  the  received  signal  yp{t)  is  significantly  larger 
than  the  message  bandwidth  Bm:  Brf  ~  2(/3  +  1  )Bm  by  Carson’s  formula,  where  (3  >  1.  Thus, 
the  RF  front  end  in  Figure  5.35  lets  in  passband  white  noise  np(t)  of  bandwidth  of  the  order 
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of  Br p,  as  shown  in  Figure  5.36.  Figure  5.36  also  shows  the  PSDs  once  we  have  passed  the 
received  signal  through  the  limiter-discriminator.  The  estimated  message  at  the  output  of  the 
limiter-discriminator  is  a  baseband  signal  which  we  can  limit  to  the  message  bandwidth  Bm, 
which  significantly  reduces  the  noise  that  we  see  at  the  output  of  the  limiter-discriminator.  Let 
us  now  compute  the  output  SNR.  From  (5.142),  the  signal  power  is  given  by 


Ps  =  (kfm(t))2  =  k)Pm  (5.143) 

The  noise  contribution  at  the  output  is  given  by 


zn(t) 


<(*) 

27t/1c 


Since  d/dt  -B-  j2nf,  zn(t)  is  obtained  by  passing  ns(t)  through  an  LTI  system  with  G(f)  = 
=  jf/Ac.  Thus,  the  noise  PSD  at  the  output  of  the  limiter-discriminator  is  given  by 

SZn(f)  =  \G(f)\2Sns(f)  =  f2N0/A2c  ,  |/|  <  Brf/2  (5.144) 


Once  we  limit 
power  is  given 


the  bandwidth  to  the  message  bandwidth  Brn  after  the  discriminator,  the  noise 
by 


Pn 


' 'Br, 


'  —  B„ 


SZn(f)df 


fBm  f2N0  u.  2B3mN0 

J-Bm  A*  J  ZAl 


(5.145) 


From  (5.143)  and  (5.145),  we  obtain  that  the  SNR  is  given  by 


Ps  3  k2fPmA2c 
SNRfm  =  ^=  f  m  c 


Pn 


2BlNn 


It  is  interesting  to  benchmark  this  against  a  baseband  communication  system  in  which  the 
message  is  sent  directly  over  the  channel.  To  keep  the  comparison  fair,  we  fix  the  received  power 
to  that  of  the  passband  system  and  the  one-sided  noise  PSD  to  that  of  the  passband  white  noise. 
Thus,  the  received  signal  power  is  Pr  =  A2c/2,  and  the  noise  power  is  N0Bm,  and  the  baseband 
benchmark  SNR  is  given  by 


SNRb 


Pr 

No  Bm 


A2 

2NoBm 


We  therefore  obtain  that 


3  k2fPm 

SNRfm  =  —d— — SNRb 

■*-*m 


(5.146) 


Let  us  now  express  this  in  terms  of  some  interesting  parameters.  The  maximum  frequency 
deviation  in  the  FM  system  is  given  by 


A  fmax  =  kfmaxt\m(t)\ 


and  the  modulation  index  is  defined  as  the  ratio  between  the  maximum  frequency  deviation  and 
the  message  bandwidth: 


P 


A/,r 


Bn 


Thus,  we  have 


k)p -  =  (A/^)2  Pm  2 

Bl  Bl  (maxt|m(()|)2  P  1 
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defining  the  peak-to-average  power  ratio  (PAR)  of  the  message  as 


pAR  _  (maxt|m(f)|)2 
m2{t) 


(maxt|m(t)|)2 

Pm 


Substituting  into  (5.146),  we  obtain  that 

3/32 

SNRfm  =  pj^SNRb  (5.147) 

Thus,  FM  can  improve  upon  the  baseband  benchmark  by  increasing  the  modulation  index  /?. 
This  is  an  example  of  a  power-bandwidth  tradeoff:  by  increasing  the  bandwidth  beyond  that 
strictly  necessary  for  sending  the  message,  we  have  managed  to  improve  the  SNR  compared  to  the 
baseband  benchmark.  However,  the  quadratic  power-bandwidth  tradeoff  offered  by  FM  is  highly 
suboptimal  compared  to  the  best  possible  tradeoffs  in  digital  communication  systems,  where 
one  can  achieve  exponential  tradeoffs.  Another  drawback  of  the  FM  power-bandwidth  tradeoff 
is  that  the  amount  of  SNR  improvement  depends  on  the  PAR  of  the  message:  messages  with 
larger  dynamic  range,  and  hence  larger  PAR,  will  see  less  improvement.  This  is  in  contrast  to 
digital  communication,  where  message  characteristics  do  not  affect  the  power-bandwidth  tradeoffs 
over  the  communication  link,  since  messages  are  converted  to  bits  via  source  coding  before 
transmission.  Of  course,  messages  with  larger  dynamic  range  may  well  require  more  bits  to 
represent  them  accurately,  and  hence  a  higher  rate  on  the  communication  link,  but  such  design 
choices  are  decoupled  from  the  parameters  governing  reliable  link  operation. 

Threshold  effect:  It  appears  from  (5.147)  that  the  output  SNR  can  be  improved  simply  by 
increasing  j3.  This  is  somewhat  misleading.  For  a  given  message  bandwidth  Bm,  increasing  /3 
corresponds  to  increasing  the  RF  bandwidth:  BrF  ~  2(/3  +  1  )Bm  by  Carson’s  formula.  Thus, 
an  increase  in  /3  corresponds  to  an  increase  in  the  power  of  the  the  passband  white  noise  at 
the  input  of  the  limiter-discriminator,  which  is  given  by  NqBrf  =  2NoBm(/3  +  1).  Thus,  if  we 
increase  /?,  the  high  SNR  approximation  underlying  (5.140),  and  hence  the  model  (5.142)  for  the 
output  of  the  limiter-discriminator,  breaks  down.  It  is  easy  to  see  this  from  the  equation  (5.139) 

for  the  phase  perturbation  due  to  noise:  9n{t)  =  tan-1  x+ITqy-  When  Ac  is  small,  variations  in 

nc(t)  can  change  the  sign  of  the  denominator,  which  leads  to  phase  changes  of  7T,  over  a  small 
time  interval.  This  leads  to  impulses  in  the  output  of  the  discriminator.  Indeed,  as  we  start 
reducing  the  SNR  at  the  input  to  the  discriminator  for  FM  audio  below  the  threshold  where  the 
approximation  (5.140)  holds,  we  can  actually  hear  these  peaks  as  “clicks”  in  the  audio  output. 
As  we  reduce  the  SNR  further,  the  clicks  swamp  out  the  desired  signal.  This  is  called  the  FM 
threshold  effect. 

To  avoid  this  behavior,  we  must  operate  in  the  high-SNR  regime  where  Ac  |nc|,  |ns|,  so  that 
the  approximation  (5.140)  holds.  In  other  words,  the  SNR  for  the  passband  signal  at  the  input 
to  the  limiter-discriminator  must  be  above  a  threshold,  say  7  (e.g.,  7  =  10  might  be  a  good  rule 
of  thumb),  for  FM  demodulation  to  work  well.  This  condition  can  be  expressed  as  follows: 


Pr 

NqBrf 


>  7 


(5.148) 


Thus,  in  order  to  utilize  a  large  RF  bandwidth  to  improve  SNR  at  the  output  of  the  limiter- 
discriminator,  the  received  signal  power  must  also  scale  with  the  available  bandwidth.  Using 
Carson’s  formula,  we  can  rewrite  (5.148)  in  terms  of  the  baseband  benchmark  as  follows: 


SNRb  =  >  27 (/3  +  1)  ,  condition  for  operation  above  threshold  (5.149) 

A0  Bm 

To  summarize,  the  power-bandwidth  tradeoff  (5.147)  applies  only  when  the  received  power  (or 
equivalently,  the  baseband  benchmark  SNR)  is  above  a  threshold  that  scales  with  the  bandwidth, 
as  specified  by  (5.149). 
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Preemphasis  and  Deemphasis 


Since  the  noise  at  the  limiter-discriminator  output  has  a  quadratic  PSD  (see  (5.144)  and  Figure 
5.36),  higher  frequencies  in  the  message  see  more  noise  than  lower  frequencies.  A  commonly 
used  approach  to  alleviate  this  problem  is  to  boost  the  power  of  the  higher  message  frequencies 
at  the  transmitter  by  using  a  highpass  preemphasis  filter.  The  distortion  in  the  message  due  to 
preemphasis  is  undone  at  the  receiver  using  a  lowpass  deemphasis  filter,  which  attenuates  the 
higher  frequencies.  The  block  diagram  of  an  FM  system  using  such  an  approach  is  shown  in 
Figure  5.37. 


Figure  5.37:  Preemphasis  and  deemphasis  in  FM  systems. 


A  typical  choice  for  the  preemphasis  filter  is  a  highpass  filter  with  a  single  zero,  with  transfer 
function  of  the  form 

HpeU )  =  1  +  j2vr/ri 

The  corresponding  deemphasis  filter  is  a  single  pole  lowpass  filter  with  transfer  function 


HdeU) 


1 

1  +  j’27t/ti 


For  FM  audio  broadcast,  T\  is  chosen  in  the  range  50-75  /zs  (e.g.,  75  /zs  in  the  United  States,  50  /zs 
in  Europe).  The  f2  noise  scaling  at  the  output  of  the  limiter- discriminator  is  compensated  by  the 
(approximately)  l//2  scaling  provided  by  |77de(/)  beyond  the  cutoff  frequency  fpd  =  1/(27tti) 
(the  subscript  indicates  the  use  of  preemphasis  and  deemphasis),  which  evaluates  to  2.1  KHz  for 
Ti  =  75  /zs. 

Let  us  compute  the  SNR  improvement  obtained  using  this  strategy.  Assuming  that  the  pre¬ 
emphasis  and  deemphasis  Liters  compensate  each  other  exactly,  the  signal  contribution  to  the 
estimated  message  at  the  output  of  the  deemphasis  Liter  in  Figure  5.37  is  kfm(t),  which  equals 
the  signal  contribution  to  the  estimated  message  at  the  output  of  the  limiter-discriminator  in 
Figure  5.35,  which  shows  a  system  not  using  preemphasis/deemphasis.  Since  the  signal  contri¬ 
butions  in  the  estimated  messages  in  both  systems  are  the  same,  any  improvement  in  SNR  must 
come  from  a  reduction  in  the  output  noise.  Thus,  we  wish  to  characterize  the  noise  PSD  and 
power  at  the  output  of  the  deemphasis  Liter  in  Figure  5.37.  To  do  this,  note  that  the  noise  at 
the  output  of  the  limiter- discriminator  is  the  same  as  before: 


zn(t) 


<(*) 

2nAc 


with  PSD 

SZn(f)  =  \G(f)\2Sns(f)  =  f2N0/A2c  ,  |/|  <  Brf/2 
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The  noise  vn  obtained  by  passing  zn  through  the  deemphasis  filter  has  PSD 


SvM)  =  \HdeU)\2SzM) 


N0  f2 
Al  1  +  (f/fpdy 


N°  &  A  i  ^ 

^  V  i  +  (f/U2) 


Integrating  over  the  message  bandwidth,  we  find  that  the  noise  power  in  the  estimated  message 
in  Figure  5.37  is  given  by 


Pn 


"Bn 


'—Brr 


SVn(f)df 


2 ,  Bm\ 

~Af~ \u  u) 


(5.150) 


where  we  have  used  the  substitution  tanx  =  f  /  fpd  to  evaluate  the  integral.  As  we  have  already 
mentioned,  the  signal  power  is  unchanged  from  the  earlier  analysis,  so  that  the  improvement  in 
SNR  is  given  by  the  reduction  in  noise  power  compared  with  (5.145),  which  gives 


SNRgain  = 


2iV0f3 


a? 


pd 


2  BlNp 
3U2 


-  tan-1 f222- 

Jpd  Jpd 


Bm  \ 

fpd  J 


-f22-  -  tan-1  f222- 

Jpd  Jpd 


(5.151) 


For  fpd  =  2.1  KHz,  corresponding  to  the  United  States  guidelines  for  FM  audio  broadcast,  and 
an  audio  bandwidth  Bm  =  15  KHz,  the  SNR  gain  in  (5.151)  evaluates  to  more  than  13  dB. 


For  completeness,  we  give  the  formula  for  the  SNR  obtained  using  preemphasis  and  deemphasis 
as 


SNR 


F  M.p  d 


Bm  A 

fpd  J 


-f222  -  tan-1  f222 )  PAR 

Jpd  Jpd  J 


SNRh 


(5.152) 


which  is  obtained  by  taking  the  product  of  the  SNR  gain  (5.151)  and  the  SNR  without  preem¬ 
phasis/ deemphasis  given  by  (5.147). 
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Chapter  6 

Optimal  Demodulation 


As  we  saw  in  Chapter  4,  we  can  send  bits  over  a  channel  by  choosing  one  of  a  set  of  waveforms  to 
send.  For  example,  when  sending  a  single  16QAM  symbol,  we  are  choosing  one  of  16  passband 
waveforms: 

Sbc,bs  =  bcp(t )  cos  2tt  fct  -  bsp(t )  sin27r  fct 

where  bc,  bs  each  take  values  in  {±1,±3}.  We  are  thus  able  to  transmit  log2  16  =  4  bits  of 
information.  In  this  chapter,  we  establish  a  framework  for  recovering  these  4  bits  when  the 
received  waveform  is  a  noisy  version  of  the  transmitted  waveform.  More  generally,  we  consider 
the  fundamental  problem  of  M-ary  signaling  in  additive  white  Gaussian  noise  (AWGN):  one  of 
M  signals,  S\(t), SM(t )  is  sent,  and  the  received  signal  equals  the  transmitted  signal  plus  white 
Gaussian  noise  (WGN). 

At  the  receiver,  we  are  faced  with  a  hypothesis  testing  problem:  we  have  M  possible  hypotheses 
about  which  signal  was  sent,  and  we  have  to  make  our  “best”  guess  as  to  which  one  holds,  based 
on  our  observation  of  the  received  signal.  We  are  interested  in  finding  a  guessing  strategy,  more 
formally  termed  a  decision  rule,  which  is  the  “best”  according  to  some  criterion.  For  communi¬ 
cations  applications,  we  are  typically  interested  in  finding  a  decision  rule  which  minimizes  the 
probability  of  error  (i.e.,  the  probability  of  making  a  wrong  guess).  We  can  now  summarize  the 
goals  of  this  chapter  as  follows. 

Goals:  We  wish  to  design  optimal  receivers  when  the  received  signal  is  modeled  as  follows: 

Hi  :  y(t)  =  Si(t)  +  n(t )  ,  i  — 

where  Hi  is  the  ith  hypothesis,  corresponding  to  signal  Si(t)  being  transmitted,  and  where  n(t) 
is  white  Gaussian  noise.  We  then  wish  to  analyze  the  performance  of  such  receivers,  to  see  how 
performance  measures  such  as  the  probability  of  error  depend  on  system  parameters.  It  turns 
out  that,  for  the  preceding  AWGN  model,  the  performance  depends  only  on  the  received  signal- 
to-noise  ratio  (SNR)  and  on  the  “shape”  of  the  signal  constellation  (si(£), ...,  %(i)}.  Underlying 
both  the  derivation  of  the  optimal  receiver  and  its  analysis  is  a  geometric  view  of  signals  and 
noise  as  vectors,  which  we  term  signal  space  concepts.  Once  we  have  this  background,  we  are  in 
a  position  to  discuss  elementary  power-bandwidth  tradeoffs.  For  example,  16QAM  has  higher 
bandwidth  efficiency  than  QPSK,  so  it  makes  sense  that  it  has  lower  power  efficiency;  that  is,  it 
requires  higher  SNR,  and  hence  higher  transmit  power,  for  the  same  probability  of  error.  We  will 
be  able  to  quantify  this  intuition,  previewed  in  Chapter  4,  based  on  the  material  in  this  chapter. 
We  will  also  be  able  to  perform  link  budget  calculations:  for  example,  how  much  transmit  power 
is  needed  to  attain  a  given  bit  rate  using  a  given  constellation  as  a  function  of  range,  and  transmit 
and  receive  antenna  gains? 

Chapter  Plan:  The  prerequisites  for  this  chapter  are  Chapter  4  (digital  modulation)  and  the 
material  on  Gaussian  random  variables  (Section  5.6)  and  noise  modeling  (Section  5.8)  in  Chap¬ 
ter  5.  We  build  up  the  remaining  background  required  to  attain  our  goals  in  this  chapter  in  a 
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step-by-step  fashion,  as  follows. 

Hypothesis  testing:  In  Section  6.1,  we  establish  the  basic  framework  for  hypothesis  testing,  de¬ 
rive  the  form  of  optimal  decision  rules,  and  illustrate  the  application  of  this  framework  for 
finite-dimensional  observations. 

Signal  space  concepts:  In  Section  6.2,  we  show  that  continuous  time  M-ary  signaling  in  AWGN 
can  be  reduced  to  an  equivalent  finite- dimensional  system,  in  which  transmitted  signal  vectors 
are  corrupted  by  vector  WGN.  This  is  done  by  projecting  the  continuous  time  signal  into  the 
finite-dimensional  signal  space  spanned  by  the  set  of  possible  transmitted  signals,  si, ...,  % .  We 
apply  the  hypothesis  testing  framework  to  derive  the  optimal  receiver  for  the  finite-dimensional 
system,  and  from  this  we  infer  the  optimal  receiver  in  continuous  time. 

Performance  analysis:  In  Section  6.3,  we  analyze  the  performance  of  optimal  reception.  We  show 
that  performance  depends  only  on  SNR  and  the  relative  geometry  of  the  signal  constellation.  We 
provide  exact  error  probability  expressions  for  binary  signaling.  While  the  probability  of  error  for 
larger  signal  constellations  must  typically  be  computed  by  simulation  or  numerical  integration, 
we  obtain  bounds  and  approximations,  building  on  the  analysis  for  binary  signaling,  that  provide 
quick  insight  into  power-bandwidth  tradeoffs. 

Link  budget  analysis:  In  Section  6.5,  we  illustrate  how  performance  analysis  is  applied  to  obtain¬ 
ing  the  “link  budget”  for  a  typical  radio  link,  which  is  the  tool  used  to  obtain  coarse  guidelines 
for  the  design  of  hardware,  including  transmit  power,  transmit  and  receive  antennas,  and  receiver 
noise  figure. 

Software:  Software  Lab  6.1  in  this  chapter  builds  on  Software  Lab  4.1,  providing  a  hands-on 
feel  for  Nyquist  signaling  over  an  AWGN  channel.  In  turn,  we  build  on  this  lab  in  Software  Lab 
8.1,  which  adds  in  channel  dispersion  to  the  model. 

Notational  shortcut:  In  this  chapter,  we  make  extensive  use  of  the  notational  simplification 
discussed  at  the  end  of  Section  5.3.  Given  a  random  variable  X,  a  common  notation  for  probabil¬ 
ity  density  function  or  probability  mass  function  is  px(x ),  with  X  denoting  the  random  variable, 
and  x  being  a  dummy  variable  which  we  might  integrate  out  when  computing  probabilities. 
However,  when  there  is  no  scope  for  confusion,  we  use  the  less  cumbersome  (albeit  incomplete) 
notation  p(x),  using  the  dummy  variable  x  not  only  as  the  argument  of  the  density,  but  also 
to  indicate  that  the  density  corresponds  to  the  random  variable  X.  (Similarly,  we  would  use 
p(y)  to  denote  the  density  for  a  random  variable  Y.)  The  same  convention  is  used  for  joint  and 
conditional  densities  as  well.  For  random  variables  X  and  Y,  we  use  the  notation  p(x,y )  in¬ 
stead  of  px,y(xi  u)i  and  p(y\x)  instead  of  Py\x(u\x),  to  denote  the  joint  and  conditional  densities, 
respectively. 


6.1  Hypothesis  Testing 

In  Example  5.6.3,  we  considered  a  simple  model  for  binary  signaling,  in  which  the  receiver  sees 
a  single  sample  Y.  If  0  is  sent,  the  conditional  distribution  of  Y  is  A^(0,n2),  while  if  1  is  sent, 
the  conditional  distribution  is  N(m,v2).  We  analyzed  a  simple  decision  rule  in  which  we  guess 
that  0  is  sent  if  Y  <  m/2,  and  guess  that  1  is  sent  otherwise.  Thus,  we  wish  to  decide  between 
two  hypotheses  (0  being  sent  or  1  being  sent)  based  on  an  observation  (the  received  sample  Y). 
The  statistics  of  the  observation  depend  on  the  hypothesis  (this  information  is  captured  by  the 
conditional  distributions  of  Y  given  each  hypotheses).  We  must  now  make  a  good  guess  as  to 
which  hypothesis  is  true,  based  on  the  value  of  the  observation.  The  guessing  strategy  is  called 
the  decision  rule,  which  maps  each  possible  value  of  Y  to  either  0  or  1. 

The  decision  rule  we  have  considered  in  Example  5.6.3  makes  sense,  splitting  the  difference 
between  the  conditional  means  of  Y  under  the  two  hypotheses.  But  is  it  always  the  best  thing 
to  do?  For  example,  if  we  know  for  sure  that  0  is  sent,  then  we  should  clearly  always  guess  that 
0  is  sent,  regardless  of  the  value  of  Y  that  we  see.  As  another  example,  if  the  noise  variance  is 


290 


different  under  the  two  hypotheses,  then  it  is  no  longer  clear  that  splitting  the  difference  between 
the  means  is  the  right  thing  to  do.  We  therefore  need  a  systematic  framework  for  hypothesis 
testing,  which  allows  us  to  derive  good  decision  rules  for  a  variety  of  statistical  models. 


In  this  section,  we  consider  the  general  problem  of  M- ary  hypothesis  testing,  in  which  we  must 
decide  which  of  M  possible  hypotheses,  H0, HM_ i,  “best  explains”  an  observation  Y.  For 
our  purpose,  the  observation  Y  can  be  a  scalar  or  vector,  and  takes  values  in  an  observation 
space  T.  The  link  between  the  hypotheses  and  observation  is  statistical:  for  each  hypothesis 
we  know  the  conditional  distribution  of  Y  given  H,.  We  denote  the  conditional  density 
of  Y  given  Hi  as  p(y\i),  i  =  0,1, ...,  M  —  1.  We  may  also  know  the  prior  probabilities  of  the 
hypotheses  (i.e.,  the  probability  of  each  hypothesis  prior  to  seeing  the  observation),  denoted  by 
7 Ti  =  P[Hi],  i  —  0,1, ...,  M  —  1,  which  satisfy  X^o1 7r=^-  The  ingredient  of  the  hypothesis 
testing  framework  is  the  decision  rule:  for  each  possible  value  Y  —  y  of  the  observation,  we  must 
decide  which  of  the  M  hypotheses  we  will  bet  on.  Denoting  this  guess  as  S(y),  the  decision  rule 
S(-)  is  a  mapping  from  the  observation  space  T  to  {0, 1, ...,  M  —  1},  where  S(y)  =  i  means  that 
we  guess  that  Hi  is  true  when  we  see  Y  =  y.  The  decision  rule  partitions  the  observation  space 
into  decision  regions,  with  Tj  denoting  the  set  of  values  of  Y  for  which  we  guess  Hi.  That  is, 
Ti  =  {j/gT  :  S(y)  —  i},  i  —  0, 1,  ...,M  —  1.  We  summarize  these  ingredients  of  the  hypothesis 
testing  framework  as  follows. 


Ingredients  of  hypothesis  testing  framework 

•  Hypotheses  H0,  Hi, ...,  Hm- i 

•  Observation  FgT 

•  Conditional  densities  p(y\i),  for  i  —  0, 1, ...,  M  —  1 

•  Prior  probabilities  7 p  =  P[Hi],  i  —  0,1, ...,  M  —  1,  with  ni  =  1 

•  Decision  rule  <5  :  T  — >■  {0, 1, ...,  M  —  1} 

•  Decision  regions  h  =  {j/6T  :  S(y)  —  i},  i  —  0, 1, ...,  M  —  1 


To  make  the  concepts  concrete,  let  us  quickly  recall  Example  5.6.3,  where  we  have  M  =  2 
hypotheses,  with  Hq  :  Y  ~  N( 0,u2)  and  Hi  :  Y  ~  N[m,v2).  The  “sensible”  decision  rule  in  this 
example  can  be  written  as 


y  <  m/2 
y  >  m/2 


so  that  T0  =  (— oo,m/2]  and  Tx  =  (m/2,  oo).  Note  that  this  decision  rule  need  not  be  optimal 
if  we  know  the  prior  probabilities.  For  example,  if  we  know  that  7T0  =  1,  we  should  say  that  H0 
is  true,  regardless  of  the  value  of  Y :  this  would  reduce  the  probability  of  error  from  Q  (^)  (for 
the  “sensible”  rule)  to  zero! 


6.1.1  Error  probabilities 

The  performance  measures  of  interest  to  us  when  choosing  a  decision  rule  are  the  conditional 
error  probabilities  and  the  average  error  probability.  We  have  already  seen  these  in  Example 
5.6.3  for  binary  on-off  keying,  but  we  now  formally  define  them  for  a  general  M-ary  hypothesis 
testing  problem.  For  a  fixed  decision  rule  5  with  corresponding  decision  regions  {T,},  we  define 
the  conditional  probabilities  of  error  as  follows. 

Conditional  Error  Probabilities:  The  conditional  error  probability,  conditioned  on  Ht,  where 
0  <  i  <  M  —  1,  is  defined  as 

Pe\i  =  P[say  Hj  for  some  j  ^  i\Hi  is  true]  =  P[Y  e  Tj\Hi]  =  1  —  P[Y  e  Ti\Hi]  (6.1) 

Hi 
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Conditional  Probabilities  of  Correct  Decision:  These  are  defined  as 

Pc\i  —  1  —  Pe\i  —  P[Y  G  Pi\Hi]  (6-2) 

Average  Error  Probability:  This  is  given  by  averaging  the  conditional  error  probabilities 
using  the  priors: 

M 

Pe  =  y  -KjPeli  (6.3) 

i—  1 

Average  Probability  of  Correct  Decision:  This  is  given  by 

M 

Pc  =  ^2  KiPc\i  =  1  ~  Pe  (6-4) 

i= 1 


6.1.2  ML  and  MAP  decision  rules 


For  a  general  M- ary  hypothesis  testing  problem,  an  intuitively  pleasing  decision  rule  is  the 
maximum  likelihood  rule,  which,  for  a  given  observation  Y  =  y,  picks  the  hypothesis  Hi  for  which 
the  observed  value  Y  —  y  is  most  likely;  that  is,  we  pick  i  so  as  to  maximize  the  conditional 
density  p(y\i). 

Notation:  We  denote  by  “arg  max”  the  argument  of  the  maximum.  That  is,  if  the  maximum 
of  a  function  f(x)  occurs  at  Xq,  then  Xq  is  the  argument  of  the  maximum: 

ma Xxf(x)  =  f(x0),  arg  ma xxf(x)  =  x0 


Note  also  that,  while  the  maximum  value  of  a  function  is  changed  if  we  apply  another  function 
to  it,  if  the  second  function  is  strictly  increasing,  then  the  argument  of  the  maximum  remains  the 
same.  For  example,  when  dealing  with  densities  taking  exponential  forms  (such  as  the  Gaussian), 
it  is  useful  to  apply  the  logarithm  (which  is  a  strictly  increasing  function),  as  we  note  for  the  ML 
rule  below. 

Maximum  Likelihood  (ML)  Decision  Rule:  The  ML  decision  rule  is  defined  as 
&ml(v)  =  arg  p(y\i)  =  arg  max0<i<M_1  log p(y\i)  (6.5) 


Another  decision  rule  that  “makes  sense”  is  the  Maximum  A  Posteriori  Probability  (MAP)  rule, 
where  we  pick  the  hypothesis  which  is  most  likely,  conditioned  on  the  value  of  the  observation. 
The  conditional  probabilities  P [Ht  \  Y  =  y]  are  called  the  a  posteriori,  or  posterior,  probabilities, 
since  they  are  probabilities  that  we  can  compute  after  we  see  the  observation  Y  =  y.  Let  us 
work  through  what  this  rule  is  actually  doing.  Using  Bayes’  rule,  the  posterior  probabilities  are 
given  by 


P[Hi\Y  =  y } 


p(y\i)P[Hj} 
p{y ) 


Kip{y\i) 
p{y ) 


i  =  0,2.,,,.,M  —  1 


Since  we  want  to  maximize  this  over  i,  the  denominator  p(y),  the  unconditional  density  of  Y, 
can  be  ignored  in  the  maximization.  We  can  also  take  the  log  as  we  did  for  the  ML  rule.  The 
MAP  rule  can  therefore  be  summarized  as  follows. 

Maximum  A  Posteriori  Probability  (MAP)  Rule:  The  MAP  decision  rule  is 
defined  as 

SMAp(y)  =  arg  max0<i<M_1  P[H,\Y  =  y]  ,g 

=  arg  max1<i<M  tt ip{y\i)  =  arg  max0<i<M_1  logvTj  +  logp(y\i)  1  ' 
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Properties  of  the  MAP  rule: 

•  The  MAP  rule  reduces  to  the  ML  rule  for  equal  priors. 

•  The  MAP  rule  minimizes  the  probability  of  error.  In  other  words,  it  is  also  the  Minimum 
Probability  of  Error  (MPE)  rule. 

The  first  property  follows  from  (6.6)  by  setting  7Tj  =  1/M :  in  this  case  7 q  does  not  depend  on  i 
and  can  therefore  be  dropped  when  maximizing  over  i.  The  second  property  is  important  enough 
to  restate  and  prove  as  a  theorem. 

Theorem  6.1.1  The  MAP  rule  (6.6)  minimizes  the  probability  of  error. 

Proof  of  Theorem  6.1.1:  We  show  that  the  MAP  rule  maximizes  the  probability  of  correct 
decision.  To  do  this,  consider  an  arbitrary  decision  rule  S,  with  corresponding  decision  regions 
{Tj}.  The  conditional  probabilities  of  correct  decision  are  given  by 


Pc\i  =  P\YeTi\Hi\=  p(y\i)dy,  i  =  0, 1, ...,  M  -  1 


so  that  the  average  probability  of  correct  decision  is 


M—  1 


A/T — 1 


Any  point  y  G  T  can  belong  in  exactly  one  of  the  M  decision  regions.  If  we  decide  to  put  it  in 
T,:,  then  the  point  contributes  the  term  Hip{y\i)  to  the  integrand.  Since  we  wish  to  maximize 
the  overall  integral,  we  choose  to  put  y  in  the  decision  region  for  which  it  makes  the  largest 
contribution  to  the  integrand.  Thus,  we  put  it  in  T*  so  as  to  maximize  7iip(y\i),  which  is  precisely 


the  MAP  rule  (6.6). 


□ 


1.85 


y 


Figure  6.1:  Hypothesis  testing  with  exponentially  distributed  observations. 
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Example  6.1.1  (Hypothesis  testing  with  exponentially  distributed  observations):  A 

binary  hypothesis  problem  is  specified  as  follows: 

H0  :Y  ~  Exp(  1)  ,H1:Y^  Exp(  1/4) 

where  Exp{p)  denotes  an  exponential  distribution  with  density  pe~^y ,  CDF  1  —  e-M2/  and  com¬ 
plementary  CDF  e_ra,  where  y  >  0  (all  the  probability  mass  falls  on  the  nonnegative  numbers). 
Note  that  the  mean  of  an  Exp(p)  random  variable  is  l//i.  Thus,  in  our  case,  the  mean  under  Hq 
is  1,  while  the  mean  under  Hi  is  4. 

(a)  Find  the  ML  rule  and  the  corresponding  conditional  error  probabilities. 

(b)  Find  the  MPE  rule  when  the  prior  probability  of  H\  is  1/5.  Also  find  the  conditional  and 
average  error  probabilities. 

Solution: 

(a)  As  shown  in  Figure  6.1,  we  have 

p(y\0)  =  e~yIy> o  ,  p(y\l)  =  (1/ A)e~y/iIy>0 

The  ML  rule  is  given  by 

Hi 

P(?/|1)  <  P(2/|0) 

H0 


which  reduces  to 

Hi 

(l/4)e-^4  >  e~y  (■ y  >  0) 

Ho 

Taking  logarithms  on  both  sides  and  simplifying,  we  obtain  that  the  ML  rule  is  given  by 

Hi 


y  >  (4/3)  log  4  =  1.8484 

H0 


The  conditional  error  probabilities  are 

Pe\ o  =  P{ say  ifi|if0]  =  P[Y  >  (4/3)  Iog4|i70] 
=  e-(4/3)loe4  =  (1/4)4/3  =  0.1575 


Pe\i  =  P[say  Ho\Hi]  =  P[Y  <  (4/3)  log 4^] 

=  1  -  e-(l/3)log4  =  1_  (1/4)l/3  =  0.37 

These  conditional  error  probabilities  are  rather  high,  telling  us  that  exponentially  distributed 
observations  with  different  means  do  not  give  us  high-quality  information  about  the  hypotheses, 
(b)  The  MPE  rule  is  given  by 

Hi 

nip{y\l)  J  nop(y |0) 

Ho 
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which  reduces  to 


(1/5)  (1/ 

This  gives 

Hx 

> 

V 

y  < 

Ho 

Proceeding  as  in  (a),  we  obtain 

Pe|0  =  g—(4/3)  log  16  =  (i/i6)4/3  =  0.0248 

Pe|!  =  1  -  e-(1/3)log16  =  1  -  (1/ 16)1/3  =  0.6031 
with  average  error  probability 

Pe  =  vr0Pe|0  +  TTiPe,!  =  (4/5)  *  0.0248  +  (1/5)  *  0.6031  =  0.1405 

Since  the  prior  probability  of  H\  is  small,  the  MPE  rule  is  biased  towards  guessing  that  Ho  is 
true.  In  this  case,  the  decision  rule  is  so  skewed  that  the  conditional  probability  of  error  under 
Hi  is  actually  worse  than  a  random  guess.  Taking  this  one  step  further,  if  the  prior  probability 
of  H\  actually  becomes  zero,  then  the  MPE  rule  would  always  guess  that  Ho  is  true.  In  this  case, 
the  conditional  probability  of  error  under  H\  would  be  one!  This  shows  that  we  must  be  careful 
about  modeling  when  applying  the  MAP  rule:  if  we  are  wrong  about  our  prior  probabilities,  and 
H i  does  occur  with  nonzero  probability,  then  our  performance  would  be  quite  poor. 


Hi 

-)e~y/A  >  (4/5)  e"y 

Ho 

-log  16  =  3.6968 
3  6 


Both  the  ML  and  MAP  rules  involve  comparison  of  densities,  and  it  is  convenient  to  express 
them  in  terms  of  a  ratio  of  densities,  or  likelihood  ratio,  as  discussed  next. 


Binary  hypothesis  testing  and  the  likelihood  ratio:  For  binary  hypothesis  testing,  the  ML 
rule  (6.5)  reduces  to 

Hi  Hi 

P{V I1)  <  P(2/|0)  ,  or  >  1  (6.7) 

H0  H0 


The  ratio  of  conditional  densities  appearing  above  is  defined  to  be  the  likelihood  ratio  (LR)  L(y) 
a  function  of  fundamental  importance  in  hypothesis  testing.  Formally,  we  define  the  likelihood 
ratio  as 


m 


p{y |i) 

p{y |o)  ’ 


yer 


(6.8) 


Likelihood  ratio  test:  A  likelihood  ratio  test  (LRT)  is  a  decision  rule  in  which  we  compare  the 
likelihood  ratio  to  a  threshold. 

Hi 

L(y)  <  7 

Ho 

where  the  choice  of  7  depends  on  our  performance  criterion.  An  equivalent  form  is  the  log 
likelihood  ratio  test  (LLRT),  where  the  log  of  the  likelihood  ratio  is  compared  with  a  threshold. 
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We  have  already  shown  in  (6.7)  that  the  ML  rule  is  an  LRT  with  threshold  7  =  1.  From  (6.6), 
we  see  that  the  MAP,  or  MPE,  rule  is  also  an  LRT: 


Hi 

niP{y I1)  <  tto.p(2/|0)  , 
Ho 


Hi 

p(z/|i)  >  ^0 

p(y |o)  <  7ri 

Ho 


This  is  important  enough  to  restate  formally. 


ML  and  MPE  rules  are  likelihood  ratio  tests. 

Hi  //, 

L(y)  ^  1  or  log  L(y)  >  0  ML  rule 


(6.9) 


Ho 

Hi 

H 

> 

< 

—  or  log  L(y) 

7Ti 

> 

< 

Ho 

H 

H 


0 

VA 

7Tl 


(6.10) 


We  now  specialize  further  to  the  setting  of  Example  5.6.3.  The  conditional  densities  are  as  shown 
in  Figure  6.2.  Since  this  example  is  fundamental  to  our  understanding  of  signaling  in  AWGN, 
let  us  give  it  a  name,  the  basic  Gaussian  example,  and  summarize  the  set-up  in  the  language  of 
hypothesis  testing. 


Figure  6.2:  Conditional  densities  for  the  basic  Gaussian  example. 


Likelihood  ratio  for  basic  Gaussian  example:  Substituting  (6.11)  into  (6.8)  and  simplifying 
(this  is  left  as  an  exercise),  obtain  that  the  likelihood  ratio  for  the  basic  Gaussian  example  is 

m  =  exp  (M™V  -  T>) 

/  ,\  (6.12) 
log  L(y)  =  j?  (my  -  \ J 
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ML  and  MAP  rules  for  basic  Gaussian  example:  Using  (6.12)  in  (6.9),  we  leave  it  as  an 
exercise  to  check  that  the  ML  rule  reduces  to 


Hi 

Y  >  m/2,  ML  rule  (m  >  0)  (6.13) 

Ho 

(check  that  the  inequalities  get  reversed  for  m  <  0).  This  is  exactly  the  “sensible”  rule  that  we 
analyzed  in  Example  5.6.3.  Using  (6.12)  in  (6.10),  we  obtain  the  MAP  rule: 

Hx 

2 

Y  >  m/2-\ - log—,  MAP  rule  (m  >  0)  (6.14) 


Example  6.1.2  (ML  versus  MAP  for  the  basic  Gaussian  example):  For  the  basic  Gaus¬ 
sian  example,  we  now  know  that  the  decision  rule  in  Example  5.6.3  is  the  ML  rule,  and  we 
showed  in  that  example  that  the  performance  of  this  rule  is  given  by 

Pe |0  =  Pe\l  =  Pe  =  Q  (^)  =  Q  (y/SNR/2) 

We  also  saw  that  at  13  dB  SNR,  the  error  probability  for  the  ML  rule  is 

Pe,ML  =  7-8  X  IQ’4 


regardless  of  the  prior  probabilities.  For  equal  priors,  the  ML  rule  is  also  MPE,  and  we  cannot 
hope  to  do  better  than  this.  Let  us  now  see  what  happens  when  the  prior  probability  of  H0  is 
7T0  =  |.  The  ML  rule  is  no  longer  MPE,  and  we  should  be  able  to  do  better  by  using  the  MAP 
rule.  We  leave  it  as  an  exercise  to  show  that  the  conditional  error  probabilities  for  the  MAP  rule 
are  given  by 


d  _  ^  /  m  i  v  ,  tto 

Pe |0  ~  Q  [  7, - 1 - 1°§ 

'  2v  m  Ti\ 


.  m  V  7T0 

Pe\l  =  Q[- - log  — 

'  2v  m  7Ti 


(6.15) 


Plugging  in  the  numbers  for  SNR  of  13  dB  and  7T0  =  |,  we  obtain 


Pe |o  =  1.1  x  KT3  ,  Pe |i  =  5.34  x  10~4 


which  averages  to 

Pe,MAP  =  7.3  X  1(T4 

a  slight  improvement  on  the  error  probability  of  the  ML  rule. 

Figure  6.3  shows  the  results  of  further  numerical  experiments  (see  caption  for  discussion). 


6.1.3  Soft  Decisions 

We  have  so  far  considered  hard  decision  rules  in  which  we  must  choose  exactly  one  of  the  M 
hypotheses.  In  doing  so,  we  are  throwing  away  a  lot  of  information  in  the  observation.  For 
example,  suppose  that  we  are  testing  H0  :  Y  ~  1V(0,4)  versus  Hi  :  Y  ~  A(10,4)  with  equal 

Hi 

priors,  so  that  the  MPE  rule  is  Y  >  5.  We  would  guess  Hi  if  Y  =  5.1  as  well  as  if  Y  =  10.3, 

H0 
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(a)  Dependence  on  SNR  (7r0  =  0.3) 


(b)  Dependence  on  priors  (SNR  =  10  dB) 


Figure  6.3:  Conditional  and  average  error  probabilities  for  the  MAP  receiver  compared  to  the 
error  probability  for  the  ML  receiver.  We  consider  the  basic  Gaussian  example,  fixing  the  priors 
and  varying  SNR  in  (a),  and  fixing  SNR  and  varying  the  priors  in  (b).  For  the  MAP  rule, 
the  conditional  error  probability  given  a  hypothesis  increases  as  the  prior  probability  of  the 
hypothesis  decreases.  The  average  error  probability  for  the  MAP  rule  is  always  smaller  than  the 
ML  rule  (which  is  the  MAP  rule  for  equal  priors)  when  7To  ^  The  MAP  error  probability 
tends  towards  zero  as  tt0  — >  0  or  tt0  — >  1. 


but  we  would  be  a  lot  more  confident  about  our  guess  in  the  latter  instance.  Rather  than 
throwing  away  this  information,  we  can  employ  soft  decisions  that  convey  reliability  information 
which  could  be  used  at  a  higher  layer,  for  example,  by  a  decoder  which  is  processing  a  codeword 
consisting  of  many  bits. 

Actually,  we  already  know  how  to  compute  soft  decisions:  the  posterior  probabilities  P  [Ht  \  Y  =  y\, 
i  =  0,1, ...,  M  —  1,  that  appear  in  the  MAP  rule  are  actually  the  most  information  that  we  can 
hope  to  get  about  the  hypotheses  from  the  observation.  For  notational  compactness,  let  us 
denote  these  by  7 Ti(y).  The  posterior  probabilities  can  be  computed  using  Bayes’  rule  as  follows: 


irfy)  =  P[Hi\Y  =  y } 


Kjp{y\i) 

p{y ) 


KiP(y\i) 
Y,jL o1  *jp(y\j) 


(6.16) 


In  practice,  we  may  settle  for  quantized  soft  decisions  which  convey  less  information  than  the 
posterior  probabilities  due  to  tradeoffs  in  precision  or  complexity  versus  performance. 


Example  6.1.3  (Soft  decisions  for  4PAM  in  AWGN):  Consider  a  4-ary  hypothesis  testing 
problem  modeled  as  follows: 

H0:Y  ~  N(-3A,  a2)  ,  Hx  :  Y  ~  N(-A,  a2)  ,  H2:Y  ~  N(A,  a2)  ,  H3  :  Y  ~  N(3A,  a2) 

This  is  a  model  that  arises  for  4PAM  signaling  in  AWGN,  as  we  see  later.  For  cr2  =  1,  A  =  1 
and  Y  =  —1.5,  find  the  posterior  probabilities  if  7To  =  0.4  and  711  =  712  =  ^3  =  0.2. 

Solution:  The  posterior  probability  for  the  All  hypothesis  is  of  the  form 


7 Ti{y)  =  c  77 ie  2-2 
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where  m,;  e  {±H,  ±3H}  is  the  conditional  mean  under  Hi,  and  where  c  is  a  constant  that  does 
not  depend  on  i.  Since  the  posterior  probabilities  must  sum  to  one,  we  have 


Solving  for  c,  we  obtain 


3  ,  x2 

{y  —  rrij  ) 


^TTjiy)  =  c  J^Trje  ~2? ~  =  1 


3=0 


3=0 


Ki{y)  = 


7 2°2 


V 


^j= 0  "3 


7 Tie  2<r^ 


Plugging  in  the  numbers,  we  obtain 

7r0(— 1.5)  =  0.4121,  ^(-1.5)  =  0.5600,  tt2(-1.5)  =  0.0279,  7r3(-1.5)  =  2.5  x  HT5 

The  MPE  hard  decision  in  this  case  is  Smpe(~  1.5)  =  1,  but  note  that  the  posterior  probability 
for  Ho  is  also  quite  high,  which  is  information  which  would  have  been  thrown  away  if  only 
hard  decisions  were  reported.  However,  if  the  noise  strength  is  reduced,  then  the  hard  decision 
becomes  more  reliable.  For  example,  for  a2  =  0.1,  we  obtain 

7r0(— 1.5)  =  9.08  x  10~5,  7r1(— 1.5)  =  0.9999,  tt2(-1.5)  =  9.36  x  HT14,  tt3(-1.5)  =  3.72  x  1(T44 

where  it  is  not  wise  to  trust  some  of  the  smaller  numbers.  Thus,  we  can  be  quite  confident  about 
the  hard  decision  from  the  MPE  rule  in  this  case. 


For  binary  hypothesis  testing,  it  suffices  to  output  one  of  the  two  posterior  probabilities,  since 
they  sum  to  one.  However,  it  is  often  more  convenient  to  output  the  log  of  the  ratio  of  the 
posteriors,  termed  the  log  likelihood  ratio  (LLR): 


LLR(v)  =  log  gjg‘iy_;l  =  log 


7np(y|i) 

nop(y\0) 


(6.17) 


Notice  how  the  information  from  the  priors  and  the  information  from  the  observations,  each  of 
which  also  takes  the  form  of  an  LLR,  add  up  in  the  overall  LLR.  This  simple  additive  combining  of 
information  is  exploited  in  sophisticated  decoding  algorithms  in  which  information  from  one  part 
of  the  decoder  provides  priors  for  another  part  of  the  decoder.  Note  that  the  LLR  contribution 
due  to  the  priors  is  zero  for  equal  priors. 


Example  6.1.4  (LLRs  for  binary  antipodal  signaling):  Consider  Hi  :  Y  ~  N(A,  a2)  versus 
H0:Y  rs./  N(— A,  a2).  We  shall  see  later  how  this  model  arises  for  binary  antipodal  signaling  in 
AWGN.  We  leave  it  as  an  exercise  to  show  that  the  LLR  is  given  by 


LLR(y)  =  ^ 


for  equal  priors. 
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6.2  Signal  Space  Concepts 


We  have  seen  in  the  previous  section  that  the  statistical  relation  between  the  hypotheses  {Hi} 
and  the  observation  Y  are  expressed  in  terms  of  the  conditional  densities  p(y\i).  We  are  now 
interested  in  applying  this  framework  for  derive  optimal  decision  rules  (and  the  receiver  structures 
required  to  implement  them)  for  the  problem  of  M-ary  signaling  in  AWGN.  In  the  language  of 
hypothesis  testing,  the  observation  here  is  the  received  signal  y(t)  modeled  as  follows: 

Hi  :  y(t)  =  Si(t)  +  n(t),  %  =  0, 1, M  -  1  (6.18) 

where  S;(t)  is  the  transmitted  signal  corresponding  to  hypothesis  Hi,  and  n(t)  is  WGN  with  PSD 
a2  =  Nq/2.  Before  we  can  apply  the  framework  of  the  previous  section,  however,  we  must  figure 
out  how  to  define  conditional  densities  when  the  observation  is  a  continuous-time  signal.  Here 
is  how  we  do  it: 

•  We  first  observe  that,  while  the  signals  s*(f)  live  in  an  infinite-dimensional,  continuous-time 
space,  if  we  are  only  interested  in  the  M  signals  that  could  be  transmitted  under  each  of  the  M 
hypotheses,  then  we  can  limit  attention  to  a  finite-dimensional  subspace  of  dimension  at  most 
M.  We  call  this  the  signal  space.  We  can  then  express  the  signals  as  vectors  corresponding  to 
an  expansion  with  respect  to  an  orthonormal  basis  for  the  subspace. 

•  The  projection  of  WGN  onto  the  signal  space  gives  us  a  noise  vector  whose  components  are 
i.i.d.  Gaussian.  Furthermore,  we  observe  that  the  component  of  the  received  signal  orthogonal  to 
the  signal  space  is  irrelevant:  that  is,  we  can  throw  it  away  without  compromising  performance. 

•  We  can  therefore  restrict  attention  to  projection  of  the  received  signal  onto  the  signal  space 
without  loss  of  performance.  This  projection  can  be  expressed  as  a  finite-dimensional  vector 
which  is  modeled  as  a  discrete  time  analogue  of  (6.18).  We  can  now  apply  the  hypothesis  testing 
framework  of  Section  6.1  to  infer  the  optimal  (ML  and  MPE)  decision  rules. 

•  We  then  translate  the  optimal  decision  rules  back  to  continuous  time  to  infer  the  structure  of 
the  optimal  receiver. 


6.2.1  Representing  signals  as  vectors 


Let  us  begin  with  an  example  illustrating  how  continuous-time  signals  can  be  represented  as 
finite- dimensional  vectors  by  projecting  onto  the  signal  space. 

QPSK/4PSK/4QAM  8PSK  16QAM 


Figure  6.4:  For  linear  modulation  with  no  intersymbol  interference,  the  complex  symbols  them¬ 
selves  provide  a  two-dimensional  signal  space  representation.  Three  different  constellations  are 
shown  here. 


Example  6.2.1  (Signal  space  for  two-dimensional  modulation):  Consider  a  single  complex¬ 
valued  symbol  b  =  bc  +  jbs  (assume  that  there  is  no  intersymbol  interference)  sent  using  two- 
dimensional  passband  linear  modulation.  The  set  of  possible  transmitted  signals  are  given  by 

Sbc,bs(t )  =  bcp(t)  cos27r fct  -  bsp(t )  sin  2nfct 
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where  (bc,  bs)  takes  M  possible  values  for  an  M-ary  constellation  (e.g.,  M  =  4  for  QPSK,  M  =  16 
for  16QAM),  and  where  p(t)  is  a  baseband  pulse  of  bandwidth  smaller  than  the  carrier  frequency 
fc.  Setting  (j>c(t)  =  p(t )  cos27r fct  and  <f)s(t)  =  —pit)  sin  27 rfct,  we  see  that  we  can  write  the  set  of 
transmitted  signals  as  a  linear  combination  of  these  signals  as  follows: 


Sbc,ba(t)  bc(f)c(t)  bs4>s(t) 


so  that  the  signal  space  has  dimension  at  most  2.  From  Chapter  2,  we  know  that  (j)c  and  <ps 
are  orthogonal  (1-Q  orthogonality),  and  hence  linearly  independent.  Thus,  the  signal  space  has 
dimension  exactly  2.  Noting  that  ||</>c||2  =  ||0S||2  =  ^||p||2,  the  normalized  versions  of  (j)c  and  <ps 
provide  an  orthonormal  basis  for  the  signal  space: 


M) 


I  I0s|  I 


We  can  now  write 

Sbc,bs(t)  =  ^IHI  bMt)  +  -j=\\p\\bail)8(t) 

With  respect  to  this  basis,  the  signals  can  be  represented  as  two  dimensional  vectors: 

Sbc,bs(t)  O  sbcM  =  ^Ibll  (  ^  ) 

That  is,  up  to  scaling,  the  signal  space  representation  for  the  transmitted  signals  are  simply  the 
two-dimensional  symbols  (bc,  bs)T .  Indeed,  while  we  have  been  careful  about  keeping  track  of 
the  scaling  factor  in  this  example,  we  shall  drop  it  henceforth,  because,  as  we  shall  soon  see, 
what  matters  in  performance  is  the  signal-to-noise  ratio,  rather  than  the  absolute  signal  or  noise 
strength. 


Orthogonal  modulation  provides  another  example  where  an  orthonormal  basis  for  the  signal 
space  is  immediately  obvious.  For  example,  if  Si, ...,  Sm  are  orthogonal  signals  with  equal  energy 
||s||2  =  Es,  then  'ipi(t)  =  provide  an  orthonormal  basis  for  the  signal  space,  and  the  vector 
representation  of  the  ith  signal  is  the  scaled  unit  vector  \fEl( 0, ...,  0, 1(  in  zth  position),  0, ...,  0)T. 

Yes  another  example  where  an  orthonormal  basis  can  be  determined  by  inspection  is  shown  in 
Figures  6.5  and  6.6,  and  discussed  in  Example  6.2.2. 


1 

so(l) 

1 

S,(t) 

0  3  1 

-1 

1  3 

2 

1 

s3(t) 

0  1  3 

-1 

0  1 

2  3 

Figure  6.5:  Four  signals  spanning  a  three-dimensional  signal  space 
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Figure  6.6:  An  orthonormal  basis  for  the  signal  set  in  Figure  6.5,  obtained  by  inspection. 


Example  6.2.2  (Developing  a  signal  space  representation  for  a  4-ary  signal  set):  Con¬ 
sider  the  example  depicted  in  Figure  6.5,  where  there  are  4  possible  transmitted  signals,  So, ...,  S3. 
It  is  clear  from  inspection  that  these  span  a  three-dimensional  signal  space,  with  a  convenient 
choice  of  basis  signals 


V’o(t)  =  fa  (*)  =  J[1,2]W,  V>2  (t)  =  I[2,3](t) 


as  shown  in  Figure  6.6.  Let  s*  =  (s*[l],  Sj[2],  Sj[3])T  denote  the  vector  representation  of  the  signal 
Si  with  respect  to  the  basis,  for  %  —  0, 1,  2,  3.  That  is,  the  coefficients  of  the  vector  Sj  are  such 
that 

2 

Si(t)  =  YMQMt) 

k= 0 

we  obtain,  again  by  inspection,  that 


Now  that  we  have  seen  some  examples,  it  is  time  to  be  more  precise  about  what  we  mean 
by  the  “signal  space.”  The  signal  space  S  is  the  finite- dimensional  subspace  (of  dimension 
n  <  M )  spanned  by  so(t), ...,  %_i(i).  That  is,  S  consists  of  all  signals  of  the  form  aoSo(t)  + 
...  +  CLM-iSM-i(t),  where  a0,...,dM- 1  are  arbitrary  scalars.  Let  ii>o(t),  •  ••,  V’n-i(^)  denote  an  or¬ 
thonormal  basis  for  S.  We  have  seen  in  the  preceding  examples  that  such  a  basis  can  often  be 
determined  by  inspection.  In  general,  however,  given  an  arbitrary  set  of  signals,  we  can  always 
construct  an  orthonormal  basis  using  the  Gram-Schmidt  procedure  described  below.  We  do  not 
need  to  use  this  procedure  often-in  most  settings  of  interest,  the  way  to  go  from  continuous  to 
discrete  time  is  clear-but  state  it  below  for  completeness. 


Gram-Schmidt  orthogonalization:  The  idea  is  to  build  up  an  orthonormal  basis  step  by 
step,  with  the  basis  after  the  mth  step  spanning  the  first  m  signals.  The  first  basis  function  is 
a  scaled  version  of  the  first  signal  (assuming  this  is  nonzero-otherwise  we  proceed  to  the  second 
signal  without  adding  a  basis  function).  We  then  consider  the  component  of  the  second  signal 
orthogonal  to  the  first  basis  function.  This  projection  is  nonzero  if  the  second  signal  is  linearly 
independent  of  the  first;  in  this  case,  we  introduce  a  basis  function  that  is  a  scaled  version  of 
the  projection.  See  Figure  6.7.  This  procedure  goes  on  until  we  have  covered  all  M  signals.  The 
number  of  basis  functions  n  equals  the  dimension  of  the  signal  space,  and  satisfies  n  <  M.  We 
can  summarize  the  procedure  as  follows. 

Letting  «Sfe_  1  denote  the  subspace  spanned  by  s0, ...,  s^-i,  the  Gram-Schmidt  algorithm  proceeds 
iteratively:  given  an  orthonormal  basis  for  Sk-i,  it  finds  an  orthonormal  basis  for  Sk .  The 
procedure  stops  when  k  =  M.  The  method  is  identical  to  that  used  for  finite-dimensional 
vectors,  except  that  the  definition  of  the  inner  product  involves  an  integral,  rather  than  a  sum, 
for  the  continuous-time  signals  considered  here. 
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Yj  (0 


^,(t) 


11 V 

Figure  6.7:  Illustrating  Step  0  and  Step  1  of  the  Gram-Schmidt  procedure. 


Step  0  (Initialization):  Let  p o  =  so-  If  4>o  ^  0,  then  set  -00  =  Note  that  ip o  provides  a 
basis  function  for  50- 

Step  k :  Suppose  that  we  have  constructed  an  orthonormal  basis  Bk-\  =  {ipo,  ...ipm-i}  for  the 
subspace  Sk-i  spanned  by  the  first  k  signals,  so,  ...,Sk-i  (note  that  m  <  k ).  Define 


m—  1 

=  sk(t )  - 

i=0 

The  signal  <pk{t)  is  the  component  of  Skit)  orthogonal  to  the  subspace  Sk-i ■  If  <pk  ^  0,  define 
a  new  basis  function  ipm(t )  =  and  update  the  basis  as  Bk  =  {ipi, ipm,  'P’m}-  If  4>k  =  0, 
then  SfcGiSfc_i,  and  it  is  not  necessary  to  update  the  basis;  in  this  case,  we  set  Bk  =  Bk- 1  = 
{lp0,-,1pm-l}- 

The  procedure  terminates  at  step  M,  which  yields  a  basis  B  =  {ipo,  ...,ipn-i}  for  the  signal  space 
S  =  Sm-i-  The  basis  is  not  unique,  and  may  depend  (and  typically  does  depend)  on  the  order  in 
which  we  go  through  the  signals  in  the  set.  We  use  the  Gram-Schmidt  procedure  here  mainly  as 
a  conceptual  tool,  in  assuring  us  that  there  is  indeed  a  finite-dimensional  vector  representation 
for  a  finite  set  of  continuous-time  signals. 
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Figure  6.8:  An  orthonormal  basis  for  the  signal  set  in  Figure  6.5,  obtained  by  applying  the 
Gram-Schmidt  procedure.  The  unknowns  A,  B ,  and  C  are  to  be  determined  in  Exercise  6.2.1. 


Exercise  6.2.1  (Application  of  the  Gram-Schmidt  procedure):  Apply  the  Gram-Schmidt 
procedure  to  the  signal  set  in  Figure  6.5.  When  the  signals  are  considered  in  increasing  order  of 
index  in  the  Gram-Schmidt  procedure,  verify  that  the  basis  signals  are  as  in  Figure  6.8,  and  fill 
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in  the  missing  numbers.  While  the  basis  thus  obtained  is  not  as  “nice”  as  the  one  obtained  by 
inspection  in  Figure  6.6,  the  Gram-Schmidt  procedure  has  the  advantage  of  general  applicability. 


Inner  products  are  preserved:  We  shall  soon  see  that  the  performance  of  M-ary  signaling 
in  AWGN  depends  only  on  the  inner  products  between  the  signals,  if  the  noise  PSD  is  fixed. 
Thus,  an  important  observation  when  mapping  the  continuous  time  hypothesis  testing  problem 
to  discrete  time  is  to  check  that  these  inner  products  are  preserved  when  projecting  onto  the 
signal  space.  Consider  the  continuous  time  inner  products 

(■ Si,Sj)=  j  Si(t)sj(t)dt  ,  i,j  =  0, 1,...,M-  1  (6.19) 

Now,  expressing  the  signals  in  terms  of  their  basis  expansions,  we  have 


n—  1 

Si{t)  =  ^  Si[k]ijjk(t)  ,  i  =  0, 1, ...,  M  -  1 

k= 0 


Plugging  into  (6.19),  we  obtain 


(Sj,  Sj ) 


n—  1  7i—l 

T:  Si[k]ipk(t)  ^2  Sj[l]^i(t)dt 


k= 0 


1=0 


Interchanging  integral  and  summations,  we  obtain 


n—  1  n— 1 


<«.,*>  =  EE  Si[k]sj[l]  /  il)k(t)‘i/>i(t)dt 

k= 0  1=0  ^ 

By  the  orthonormality  of  the  basis  functions  {t/A};  we  have 

(V’fc,  V>i)  =  J  'tPk(t)'*Pi(t)dt  =  Sia  =  |  J’  ^  \ 

This  collapses  the  two  summations  into  one,  so  that  we  obtain 

n— 1 

(si,Sj)  =  /  Si(t)sj(t)dt  =  y%i[fc]sj[fc]  =  (si,Sj) 


k= 0 


(6.20) 


where  the  extreme  right-hand  side  is  the  inner  product  of  the  signal  vectors  s,:  =  (sJO], ...,  s,; [n  — 
1])T  and  Sj  =  (sj[0], ...,  Sj[n  —  1])T.  This  makes  sense:  the  geometric  relationship  between  signals 
(which  is  what  the  inner  products  capture)  should  not  depend  on  the  basis  with  respect  to  which 
they  are  expressed. 


6.2.2  Modeling  WGN  in  signal  space 

What  happens  to  the  noise  when  we  project  onto  the  signal  space?  Define  the  noise  projection 
onto  the  it\i  basis  function  as 


Ni  =  (n,  -0*) 


J  n{t)^i(t)dt  ,  i  —  0, 1, ...,  n  —  1 


(6.21) 
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Then  we  can  write  the  noise  n(t)  as  follows: 


n— 1 


n(t)  =  5Z  NMt)  +  ^(t) 


i=0 


where  n±(t)  is  the  projection  of  the  noise  orthogonal  to  the  signal  space.  Thus,  we  can  decom¬ 
pose  the  noise  into  two  parts:  a  noise  vector  N  =  (No, ...,  iVn_ i)T  corresponding  to  the  projection 
onto  the  signal  space,  and  a  component  n^(t)  orthogonal  to  the  signal  space.  In  order  to  charac¬ 
terize  the  statistics  of  these  quantities,  we  need  to  consider  random  variables  obtained  by  linear 
processing  of  WGN.  Specifically,  consider  random  variables  generated  by  passing  WGN  through 
correlators: 


Zi  = 


n(t)ui(t)dt  =  (n,  Mi) 


= 


n(t)u2(t)dt  =  (n,u2) 


where  Mi  and  M2  are  deterministic,  finite  energy  signals.  We  can  now  state  the  following  result. 


Theorem  6.2.1  (WGN  through  correlators):  The  random  variables  Z\  =  (n,u\)  and  Z2  = 
(n,  M2)  are  zero  mean,  jointly  Gaussian,  with 

cov(Zi,  Z2)  =  cov  ((n,Mi),  (n,  m2))  =  <t2(mi,m2) 

Specializing  to  u\  =  m2  =  u,  we  obtain  that 

var((n,  u))  =  cov((n,  u),  ( n ,  u))  =  cr2| |m| |2 

Thus,  we  obtain  that  Z  =  (Z1,  Z2)T  ~  IV(0,  C)  with  covariance  matrix 

(72||mi||2  o-2(mi,m2) 

<t2(mi,m2)  <t2||m2||2 


Proof  of  Theorem  6.2.1:  The  random  variables  Z\  =  (n,u\)  and  Z2  =  (n,u2)  are  zero  mean 
and  jointly  Gaussian,  since  n  is  zero  mean  and  Gaussian.  Their  covariance  is  computed  as 

cov  ((n,  Mi),  (n,  m2))  =  E  [(n,  u\)(n,  u2))  —  E  [J  n(t)u\(t)dt  f  n(s)u2(s)ds] 

—  f  f  Ui(t)u2(s)E[n(t)n(s)]dt  ds  —  f  f  ui(t)u2(s)cr2<5(t  —  s)dt  ds 
=  a2  f  ui(t)u2(t)dt  =  o2(u\,u2) 

The  preceding  computation  is  entirely  analogous  to  the  ones  we  did  in  Example  5.8.2  and  in 
Section  5.10,  but  it  is  important  enough  that  we  repeat  some  points  that  we  had  mentioned 
then.  First,  we  need  to  use  two  different  variables  of  integration,  t  and  s,  in  order  to  make  sure 
we  capture  all  the  cross  terms.  Second,  when  we  take  the  expectation  inside  the  integrals,  we 
must  group  all  random  terms  inside  it.  Third,  the  two  integrals  collapse  into  one  because  the 
autocorrelation  function  of  WGN  is  impulsive.  Finally,  specializing  the  covariance  to  get  the 
variance  leads  to  the  remaining  results  stated  in  the  theorem.  □ 

We  can  now  provide  the  following  geometric  interpretation  of  WGN. 

Remark  6.2.1  (Geometric  interpretation  of  WGN):  Theorem  6.2.1  implies  that  the  pro¬ 
jection  of  WGN  along  any  “direction”  in  the  space  of  signals  (i.e. ,  the  result  of  correlating  WGN 
with  a  unit  energy  signal)  has  variance  <r2  =  No/2.  Also,  its  projections  in  orthogonal  directions 
are  jointly  Gaussian  and  uncorrelated  random  variables,  and  are  therefore  independent. 
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Noise  projection  on  the  signal  space  is  discrete  time  WGN:  It  follows  from  the  preceding 
remark  that  the  noise  projections  iVj  =  (n,  ipi)  along  the  orthonormal  basis  functions  {V’i}  for 
the  signal  space  are  i.i.d.  N( 0,<72)  random  variables.  In  other  words,  the  noise  vector  N  = 
(N0, ...,  Nn_i)  N  (0,  cr2I).  In  other  word,  the  components  of  N  constitute  discrete  time  white 
Gaussian  noise  (“white”  in  this  case  means  uncorrelated  and  having  equal  variance  across  all 
components) . 


6.2.3  Hypothesis  testing  in  signal  space 

n  (t)  (infinite-dimensional  waveform) 


Figure  6.9:  Illustration  of  signal  space  concepts.  The  noise  projection  n±(t)  orthogonal  to  the 
signal  space  is  irrelevant.  The  relevant  part  of  the  received  signal  is  the  projection  onto  the  signal 
space,  which  equals  the  vector  Y  =  s,  +  N  under  hypothesis  H 


Now  that  we  have  the  signal  and  noise  models,  we  can  put  them  together  in  our  hypothesis 
testing  framework.  Let  us  condition  on  hypothesis  H^.  The  received  signal  is  given  by 

y{t)  =  Si(t)  +  n(t)  (6.22) 

Projecting  this  onto  the  signal  space  by  correlating  against  the  orthonormal  basis  functions,  we 
get 

Y[k]  =  (; y ,  ipk)  =  (si  +  n,  i>k)  =  Si[k }  +  N[k]  ,  k  =  0, 1., , ,  .n  -  1 
Collecting  these  into  an  n-dimensional  vector,  we  get  the  model 

Hi  :  Y  =  Si  +  N 

Note  that  the  vector  Y  =  (r/[l], ...,  y[n])T  completely  describes  the  component  of  the  received 
signal  y(t)  in  the  signal  space,  given  by 


7i—l  n—  1 

ys(t)  =  = 

j= o  j= o 

The  component  of  the  received  signal  orthogonal  to  the  signal  space  is  given  by 

y±(t)  =  y{t)  -  ys(t) 

It  is  shown  in  Appendix  6. A  that  this  component  is  irrelevant  to  our  decision.  There  are  two 
reasons  for  this,  as  elaborated  in  the  appendix:  first,  there  is  no  signal  contribution  orthogonal 
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to  the  signal  space  (by  definition);  second,  for  the  WGN  model,  the  noise  component  orthogonal 
to  the  signal  space  carries  no  information  regarding  the  noise  vector  in  the  signal  space.  As  illus¬ 
trated  in  Figure  6.9,  this  enables  us  to  reduce  our  infinite-dimensional  problem  to  the  following 
finite- dimensional  vector  model,  without  loss  of  optimality. 

Model  for  received  vector  in  signal  space 

Hi  :Y  =  Sj  +  N  ,  i  =  0,1,...,M-  1  (6.23) 

where  N  ~  Y(0,a2I). 


Figure  6.10:  A  signal  space  view  of  QPSK.  In  the  scenario  shown,  So  is  the  transmitted  vector, 
and  Y  =  s0  +  N  is  the  received  vector  after  noise  is  added.  The  noise  components  Nc,  Ns  are 
i.i.d.  Y(0,cr2)  random  variables. 


Two-dimensional  modulation  (Example  6.2.1  revisited):  For  a  single  symbol  sent  using 
two-dimensional  modulation,  we  have  the  hypotheses 

Hbc,bs  ■  y(t)  =  sbctbs(t)  +n(t ) 


where 


Sbc,bs(t )  =  bcp(t)  cos  2nfct  -  bsp(t )  sin  27r fct 

Restricting  attention  to  the  two-dimensional  signal  space  identified  in  the  example,  we  obtain 
the  model 


where  we  have  absorbed  scale  factors  into  the  symbol  (bc,  bs),  and  where  the  I  and  Q  noise  compo¬ 
nents  Nc ,  Ns  are  i.i.d.  Y(0,er2).  This  is  illustrated  for  QPSK  in  Figure  6.10.  Thus,  conditioned 
on  Hbctba,  Yc  N(bc,a2)  and  Ys  ~  N(bs,a2),  and  Yc,  Ys  are  conditionally  independent.  The 
conditional  density  of  Y  =  (Yc,  YS)T  conditioned  on  Hbcbs  is  therefore  given  by 


p(yc,ys\bc,bs) 


^  r-(yc-bc)2/(  2a2) 

2a2 


^  r-(ys-b3)2/(2a2) 

2  a2 
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We  can  now  infer  the  ML  and  MPE  rules  using  our  hypothesis  testing  framework.  However,  since 
the  same  reasoning  applies  to  signal  spaces  of  arbitrary  dimensions,  we  provide  a  more  general 
discussion  in  the  next  section,  and  then  return  to  examples  of  two-dimensional  modulation. 


6.2.4  Optimal  Reception  in  AWGN 


We  begin  by  characterizing  the  optimal  receiver  when  the  received  signal  is  a  finite- dimensional 
vector.  Using  this,  we  infer  the  optimal  receiver  for  continuous-time  received  signals. 

Demodulation  for  M- ary  signaling  in  discrete  time  AWGN  corresponds  to  solving  an  M- ary 
hypothesis  testing  problem  with  observation  model  as  follows: 


Hi  :  Y  =  Sj  +  N  i  —  0, 1, ...,  M  —  1 


(6.24) 


where  N  ~  iV(0,cr2I)  is  discrete  time  WGN.  The  ML  and  MPE  rules  for  this  problem  are  given 
as  follows.  As  usual,  we  denote  the  prior  probabilities  required  to  specify  the  MPE  rule  by 
{t Ti,  i  =  1, M}  (E*=o 1 7P  =  !)■ 

Optimal  Demodulation  for  Signaling  in  Discrete  Time  AWGN 


ML  rule 


MPE  rule 


SmM  =  arg  min0<i<M_1  |  |y  -  s*|  |2 
=  arg  max0<KM_1  (y,  s,)  - 


s,l|2 


(6.25) 


Smpe{ y)  =  arg  1 1 y  —  |  j 2  —  2a2  log^ 

=  arg  max0<i<¥_1  (y,  s*)  -  +  n2  logvq 


(6.26) 


Interpretation  of  optimal  decision  rules:  The  ML  rule  can  be  interpreted  in  two  ways. 
The  first  is  as  a  minimum  distance  rule,  choosing  the  transmitted  signal  which  has  minimum 
Euclidean  distance  to  the  noisy  received  signal.  The  second  is  as  a  “template  matcher” :  choosing 
the  transmitted  signal  with  highest  correlation  with  the  noisy  received  signal,  while  adjusting 
for  the  fact  that  the  energies  of  different  transmitted  signals  may  be  different.  The  MPE  rule 
adjusts  the  ML  cost  function  to  reflect  prior  information:  the  adjustment  term  depends  on  the 
noise  level  and  the  prior  probabilities.  The  MPE  cost  functions  decompose  neatly  into  a  sum  of 
the  ML  cost  function  (which  depends  on  the  observation)  and  a  term  reflecting  prior  knowledge 
(which  depends  on  the  prior  probabilities  and  the  noise  level).  The  latter  term  scales  with  the 
noise  variance  <r2.  Thus,  we  rely  more  on  the  observation  at  high  SNR  (small  cr) ,  and  more  on 
prior  knowledge  at  low  SNR  (large  a). 

Derivation  of  optimal  receiver  structures  (6.25)  and  (6.26):  Under  hypothesis  Hi ,  Y  is 
a  Gaussian  random  vector  with  mean  s,:  and  covariance  matrix  cr2I  (the  translation  of  the  noise 
vector  N  by  the  deterministic  signal  vector  s *  does  not  change  the  covariance  matrix),  so  that 


PY\i(y\Hi) 


1 

(: 2na2)n /2 


(6.27) 


Plugging  (6.27)  into  the  ML  rule  (6.5,  we  obtain  the  rule  (6.25)  upon  simplification.  Similarly, 
we  obtain  (6.26)  by  substituting  (6.27)  in  the  MPE  rule  (6.6).  □ 

We  now  map  the  optimal  decision  rules  in  discrete  time  back  to  continuous  time  to  obtain  optimal 
detectors  for  the  original  continuous-time  model  (6.18),  as  follows. 
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Optimal  Demodulation  for  Signaling  in  Continuous  Time  AWGN 

ML  rule 

1 1 2 

&ML(y)  -  arg  maxo^j^  (y,  st) 

(6.28) 

MPE  rule 

ID  ||2 

Smpe( y)  =  arg  (y,  st)  +  a2  logvr; 

(6.29) 

Derivation  of  optimal  receiver  structures  (6.28)  and  (6.29):  Due  to  the  irrelevance  of  y -1, 
the  continuous  time  model  (6.18)  reduces  to  the  discrete  time  model  (6.24)  by  projecting  onto 
the  signal  space.  It  remains  to  map  the  optimal  decision  rules  (6.25)  and  (6.26)  for  discrete  time 
observations,  back  to  continuous  time.  These  rules  involve  correlation  between  the  received  and 
transmitted  signals,  and  the  transmitted  signal  energies.  It  suffices  to  show  that  these  quantities 
are  the  same  for  both  the  continuous  time  model  and  the  equivalent  discrete  time  model.  We 
know  now  that  signal  inner  products  are  preserved,  so  that 


Further,  the  continuous-time  correlator  output  can  be  written  as 

( y ,  Si)  =  ( ys  +  y^-,  st )  =  (ys,  s<)  +  ( y± ,  s*) 

=  (ys,  Si)  =  (y,  s  i) 

where  the  last  equality  follows  because  the  inner  product  between  the  signals  ys  and  s,  (which 
both  lie  in  the  signal  space)  is  the  same  as  the  inner  product  between  their  vector  representations. 

□ 

Why  don’t  we  have  a  “minimum  distance”  rule  in  continuous  time?  Notice  that  the 
optimal  decision  rules  for  the  continuous  time  model  do  not  contain  the  continuous  time  version 
of  the  minimum  distance  rule  for  discrete  time.  This  is  because  of  a  technical  subtlety.  In 
continuous  time,  the  squares  of  the  distances  would  be 

l|y-ai||2=llto-aiir  +  l|y±||2  =  l|y5-ai||2  +  ||nJ-||2 

Under  the  AWGN  model,  the  noise  power  orthogonal  to  the  signal  space  is  infinite,  hence  from 
a  purely  mathematical  point  of  view,  the  preceding  quantities  are  infinite  for  each  i  (so  that  we 
cannot  minimize  over  i).  Hence,  it  only  makes  sense  to  talk  about  the  minimum  distance  rule 
in  a  finite- dimensional  space  in  which  the  noise  power  is  finite.  The  correlator  based  form  of 
the  optimal  detector,  on  the  other  hand,  automatically  achieves  the  projection  onto  the  finite¬ 
dimensional  signal  space,  and  hence  does  not  suffer  from  this  technical  difficulty.  Of  course,  in 
practice,  even  the  continuous  time  received  signal  may  be  limited  to  a  finite- dimensional  space  by 
filtering  and  time-limiting,  but  correlator-based  detection  still  has  the  practical  advantage  that 
only  components  of  the  received  signal  which  are  truly  useful  appear  in  the  decision  statistics. 

Bank  of  Correlators  or  Matched  Filters:  The  optimal  receiver  involves  computation  of  the 
decision  statistics 

(y,Si)  =  I  y(t)si(t)dt 

and  can  therefore  be  implemented  using  a  bank  of  correlators,  as  shown  in  Figure  6.11.  Of 
course,  any  correlation  operation  can  also  be  implemented  using  a  matched  filter,  sampled  at  the 
appropriate  time.  Defining  sijinf(t)  =  Sj(— t)  as  the  impulse  response  of  the  filter  matched  to  s*, 
we  have 

(y,  Si)  =  I  y(t)si(t)dt  =  I  y(t)si>mf(-t)dt  =  (y  *  siimf)  (0) 
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y(t) 


Decision 


s  (t) 

M-l 


M-l 


Figure  6.11:  The  optimal  receiver  for  an  AWGN  channel  can  be  implemented  using  a  bank  of 
correlators.  For  the  ML  rule,  the  constants  a*  =  ||s.;||2/2;  for  the  MPE  rule,  a*  =  ||s,;||2/2  — 
a2  log  7Tj. 


y(t) 


Decision 


aM-l 


Figure  6.12:  An  alternative  implementation  for  the  optimal  receiver  using  a  bank  of  matched 
hlters.  For  the  ML  rule,  the  constants  a*  =  ||s.;||2/2;  for  the  MPE  rule,  cp  =  ||sj||2/2  —  cr2  log  7Tj. 
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Figure  6.12  shows  an  alternative  implementation  for  the  optimal  receiver  using  a  bank  of  matched 
filters. 


yb 


Decision 


statistic 


s  (tj 
P 


-2  sin  2ji  £  t  ss  (6 


Figure  6.13:  The  passband  correlations  required  by  the  optimal  receiver  can  be  implemented  in 
complex  baseband.  Since  the  I  and  Q  components  are  lowpass  waveforms,  correlation  with  them 
is  an  implicit  form  of  lowpass  filtering.  Thus,  the  LPFs  after  the  mixers  could  potentially  be 
eliminated,  which  is  why  they  are  shown  within  dashed  boxes. 


Implementation  in  complex  baseband:  We  have  developed  the  optimal  receiver  structures 
for  real-valued  signals,  so  that  these  apply  to  physical  baseband  and  passband  signals.  However, 
recall  from  Chapter  2  that  correlation  and  filtering  in  passband,  which  is  what  the  optimal  receiver 
does,  can  be  implemented  in  complex  baseband  after  downconversion.  In  particular,  for  passband 
signals  up(t)  =  uc(t)  cos2tt fct  —  us(t)  sin27r/cf  and  vp(t)  =  vc{t)  cos27r/cf  —  vs{t)  sin 2n fct,  the 
inner  product  can  be  written  as 


(up,vp)  =  ~((uc,vc)  +  (us,vs))  =  -R  e(u,v) 


(6.30) 


where  u  =  uc  +  jus  and  v  =  vc  +  jvs  are  the  corresponding  complex  envelopes.  Figure  6.13  shows 
how  a  passband  correlation  can  be  implemented  in  complex  baseband.  Note  that  we  correlate  the 
I  component  with  the  I  component,  and  the  Q  component  with  the  Q  component.  This  is  because 
our  optimal  receiver  is  based  on  the  assumption  of  coherent  reception:  our  model  assumes  that 
the  receiver  has  exact  copies  of  the  noiseless  transmitted  signals.  Thus,  ideal  carrier  synchronism 
is  implicitly  assumed  in  this  model,  so  that  the  I  and  Q  components  do  not  get  mixed  up  as  they 
would  if  the  receiver’s  LO  were  not  synchronized  to  the  incoming  carrier. 


6.2.5  Geometry  of  the  ML  decision  rule 

The  minimum  distance  interpretation  for  the  ML  decision  rule  implies  that  the  decision  regions 
(in  signal  space)  for  M- ary  signaling  in  AWGN  are  constructed  as  follows.  Interpret  the  signal 
vectors  {s^},  and  the  received  vector  y,  as  points  in  n-dimensional  Euclidean  space.  When 
deciding  between  any  pair  of  signals  s,;  and  s j  (which  are  points  in  n-dimensional  space),  we 
draw  a  line  between  these  points.  The  decision  boundary  is  the  the  perpendicular  bisector  of  this 
line,  which  is  an  (n— l)-dimensional  hyperplane.  This  is  illustrated  in  Figure  6.14,  where,  because 
we  are  constrained  to  draw  on  two-dimensional  paper,  the  hyperplane  reduces  to  a  line.  But  we 
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Si 


ML  decision  boundary 
is  an  (n-1)  dimensional  hyperplane 


Figure  6.14:  The  ML  decision  boundary  when  testing  between  s,;  and  s j  is  the  perpendicular 
bisector  of  the  line  joining  the  signal  points,  which  is  an  (n  —  1) -dimensional  hyperplane  for  an 
n-dirnensional  signal  space. 


can  visualize  a  plane  containing  the  decision  boundary  coming  out  of  the  paper  for  a  three- 
dimensional  signal  space.  While  it  is  hard  to  visualize  signal  spaces  of  more  than  3  dimensions, 
the  computation  for  deciding  which  side  of  the  ML  decision  boundary  the  received  vector  y  lies 
on  is  straightforward:  simply  compare  the  Euclidean  distances  ||y  —  s*||  and  1 1 y  —  sy  1 1  ■ 


Figure  6.15:  ML  decision  region  Ti  for  signal  Si. 


The  ML  decision  regions  are  constructed  from  drawing  these  pairwise  decision  regions.  For  any 
given  i,  draw  a  line  between  s*  and  s j  for  all  j  ^  i.  The  perpendicular  bisector  of  the  line  between 
Si  and  Sj  defines  two  half-spaces  (half-planes  for  n  =  2),  one  in  which  we  choose  s*  over  s j,  the 
other  in  which  we  choose  s j  over  sz.  The  intersection  of  the  half-spaces  in  which  s,  is  chosen  over 
Sj,  for  j  7^  i,  defines  the  decision  region  Tj.  This  procedure  is  illustrated  for  a  two-dimensional 
signal  space  in  Figure  6.15.  The  line  Lu  is  the  perpendicular  bisector  of  the  line  between  si  and 
Sj.  The  intersection  of  these  lines  defines  Ti  as  shown.  Note  that  Li6  plays  no  role  in  determining 
Ti,  since  signal  s@  is  “too  far”  from  Si,  in  the  following  sense:  if  the  received  signal  is  closer  to  s@ 
than  to  Si,  then  it  is  also  closer  to  Sj  than  to  Si  for  some  i  —  2,  3, 4,  5.  This  kind  of  observation 
plays  an  important  role  in  the  performance  analysis  of  ML  reception  in  Section  6.3. 
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Figure  6.16:  ML  decision  regions  for  some  two-dimensional  constellations. 


The  preceding  procedure  can  now  be  applied  to  the  simpler  scenario  of  two-dimensional  constel¬ 
lations  to  obtain  ML  decision  regions  as  shown  in  Figure  6.16.  For  QPSK,  the  ML  regions  are 
simply  the  four  quadrants.  For  8PSK,  the  ML  regions  are  sectors  of  a  circle.  For  16QAM,  the 
ML  regions  take  a  rectangular  form. 


6.3  Performance  Analysis  of  ML  Reception 

We  focus  on  performance  analysis  for  the  ML  decision  rule,  assuming  equal  priors  (for  which  the 
ML  rule  minimizes  the  error  probability).  The  analysis  for  MPE  reception  with  unequal  priors 
is  skipped,  but  it  is  a  simple  extension.  We  begin  with  a  geometric  picture  of  how  errors  are 
caused  by  WGN. 


6.3.1  The  Geometry  of  Errors 


In  Figure  6.17,  suppose  that  signal  s  is  sent,  and  we  wish  to  compute  the  probability  that  the 
noise  vector  N  causes  the  received  vector  to  cross  a  given  decision  boundary.  From  the  figure, 
it  is  clear  that  an  error  occurs  when  Nperp,  the  projection  of  the  noise  vector  perpendicular  to 
the  decision  boundary,  is  what  determines  whether  or  not  we  will  cross  the  boundary.  It  does 
not  matter  what  happens  with  the  component  Npar  parallel  to  the  boundary.  While  we  draw 
the  picture  in  two  dimensions, the  same  conclusion  holds  in  general  for  an  n-dimensional  signal 
space,  where  s  and  N  have  dimension  n.  Npar  has  dimension  n  —  1,  while  Nperp  is  still  a  scalar. 
Since  Nperp  ~  1V(0,  a2)  (the  projection  of  WGN  in  any  direction  has  this  distribution),  we  have 


P [cross  a  boundary  at  distance  D\  =  P[Nperp  >  D]  =  Q 


(6.31) 


Now,  let  us  apply  the  same  reasoning  to  the  decision  boundary  corresponding  to  making  an 
ML  decision  between  two  signals  s0  and  sl5  as  shown  in  Figure  6.18.  Suppose  that  s0  is  sent. 
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Figure  6.17:  Only  the  component  of  noise  perpendicular  to  the  decision  boundary,  Nperp,  can 
cause  the  received  vector  to  cross  the  decision  boundary,  starting  from  the  signal  point  s. 


Figure  6.18:  When  making  an  ML  decision  between  So  and  Si,  the  decision  boundary  is  at 
distance  D  =  d/2  from  each  signal  point,  where  d  —  ||si  —  So||  is  the  Euclidean  distance  between 
the  two  points. 
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What  is  the  probability  that  the  noise  vector  N,  when  added  to  it,  sends  the  received  vector  into 
the  wrong  region  by  crossing  the  decision  boundary?  We  know  from  (6.31)  that  the  answer  is 
Q(D/a),  where  D  is  the  distance  between  s0  and  the  decision  boundary.  For  ML  reception,  the 
decision  boundary  is  the  plane  that  is  the  perpendicular  bisector  of  the  line  between  s0  and  s1; 
whose  length  equals  d  —  ||si  —  s0||,  the  Euclidean  distance  between  the  two  signal  vectors.  Thus, 
D  =  d/2  =  1 1 s !  —  s0||/2.  Thus,  the  probability  of  crossing  the  ML  decision  boundary  between 
the  two  signal  vectors  (starting  from  either  of  the  two  signal  points)  is 

— ^  (6.32) 


P  [cross  ML  boundary  between  s0  and  Si_]  =  Q 


lsi  —  so| 
2a 


=  Q 


Sl  - 


2a 


where  we  note  that  the  Euclidean  distance  between  the  signal  vectors  and  the  corresponding 
continuous  time  signals  is  the  same. 

Notation:  Now  that  we  have  established  the  equivalence  between  working  with  continuous  time 
signals  and  the  vectors  that  represent  their  projections  onto  signal  space,  we  no  longer  need  to 
be  careful  about  distinguishing  between  them.  Accordingly,  we  drop  the  use  of  boldface  notation 
henceforth,  using  the  notation  y,  Si  and  n  to  denote  the  received  signal,  the  transmitted  signal, 
and  the  noise,  respectively,  in  both  settings. 


6.3.2  Performance  with  binary  signaling 


Consider  binary  signaling  in  AWGN,  where  the  received  signal  is  modeled  using  two  hypotheses 
as  follows: 

Hi  :  y(t)  =  Si(t)  +n(t)  .  , 

H0  :  y(t)  =  s0(t)  +  n(t) 

Geometric  computation  of  error  probability:  The  ML  decision  boundary  for  this  problem 
is  as  in  Figure  6.18.  The  conditional  error  probability  is  simply  the  probability  that,  starting  from 
one  of  the  signal  points,  the  noise  makes  us  cross  the  boundary  to  the  wrong  side,  the  probability 
of  which  we  have  already  computed  in  (6.32).  Since  the  conditional  error  probabilities  are  equal, 
they  also  equal  the  average  error  probability  regardless  of  the  priors.  We  therefore  obtain  the 
following  expression. 

Error  probability  for  binary  signaling  with  ML  reception 

Pe,ML  =  Pe\l  =  Pe|0  =  Q  =  Q  (J^j  (6-34) 

where  d  —  ||si  —  s0||  is  the  distance  between  the  two  possible  received  signals. 

Algebraic  computation:  While  this  geometric  computation  is  intuitively  pleasing,  it  is  impor¬ 
tant  to  also  master  algebraic  approaches  to  computing  the  probabilities  of  errors  due  to  WGN. 
It  is  easiest  to  first  consider  on-off  keying. 


Hi  :  y(t)  =  s(t)  +  n(t) 
H0  :  y(t)  =  n(t) 

Applying  (6.28),  we  End  that  the  ML  rule  reduces  to 


Hi 

(y,s)  ^ 

H0 


2 


(6.35) 


(6.36) 
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Setting  Z  =  ( y,s ),  we  wish  to  compute  the  conditional  error  probabilities  given  by 

II  1 1 2  1 1  1 12 

PeU  =  P\Z  <  Pe ,0  =  P\Z  >  l-^-\H0]  (6.37) 

We  have  actually  already  done  these  computations  in  Example  5.8.2,  but  it  pays  to  review  them 
quickly.  Note  that,  conditioned  on  either  hypothesis,  Z  is  a  Gaussian  random  variable.  The 
conditional  mean  and  variance  of  Z  under  7/(l  are  given  by 

E[Z\H0]  =  E[(ra,  s)]  =  0 

var(Z|P0)  =  cov((n,  s),  (n,  s))  =  cr2 1 1 s|  |2 

where  we  have  used  Theorem  6.2.1,  and  the  fact  that  n{t)  has  zero  mean.  The  corresponding 
computation  under  Hi  is  as  follows: 

E[Z\Hi]  =  E[(s  +  n,  s)]  =  ||s||2 

vax(Z\Hi)  =  cov((s  +  n,  s),  ( s  +  n,  s))cov((n,  s),  (n,  s))  =  cr2||s||2 

noting  that  covariances  do  not  change  upon  adding  constants.  Thus,  Z  ~  1V(0,  v2)  under  H0  and 
Z  ~  N(m,v 2)  under  Hi,  where  m  =  ||s||2  and  v2  =  cr2||s||2.  Substituting  in  (6.37),  it  is  easy  to 
check  that 

Pe\i  =  Pe\o  —  Q  (6.38) 

Going  back  to  the  more  general  binary  signaling  problem  (6.33),  the  ML  rule  is  given  by  (6.28) 
to  be 

2  Hl 

(y,  «i)  -  <  (y,  so)  - 

H0 

We  can  analyze  this  system  by  considering  the  joint  distribution  of  the  correlator  statistics  (y,  s i) 
and  (y,so),  which  are  jointly  Gaussian  conditioned  on  each  hypothesis.  However,  it  is  simpler 
and  more  illuminating  to  rewrite  the  ML  decision  rule  as 

Hi 

(y,  si  -  s0)  J 

Ho 

This  is  consistent  with  the  geometry  depicted  in  Figure  6.18:  only  the  projection  of  the  received 
signal  along  the  line  joining  the  signals  matters  in  the  decision,  and  hence  only  the  noise  along 
this  direction  can  produce  errors.  The  analysis  now  involves  the  conditional  distributions  of  the 
single  decision  statistic  Z  =  (y,  si  —  so),  which  is  conditionally  Gaussian  under  either  hypothesis. 
The  computation  of  the  conditional  error  probabilties  is  left  as  an  exercise,  but  we  already  know 
that  the  answer  should  work  out  to  (6.34). 

A  quicker  approach  is  to  consider  a  transformed  system  with  received  signal  y(t)  =  y(t)  —  So(t). 
Since  this  transformation  is  invertible,  the  performance  of  an  optimal  rule  is  unchanged  under 
it.  But  the  transformed  received  signal  y(t)  falls  under  the  on-off  signaling  model  (6.35),  with 
s(t)  =  Si(t)  —  So(t).  The  ML  error  probability  formula  (6.34)  therefore  follows  from  the  formula 
(6.38). 

Scale  Invariance:  The  formula  (6.34)  illustrates  that  the  performance  of  the  ML  rule  is  scale- 
invariant:  if  we  scale  the  signals  and  noise  by  the  same  factor  a,  the  performance  does  not 
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change,  since  both  ||si  —  s0||  and  a  scale  by  a.  Thus,  the  performance  is  determined  by  the  ratio 
of  signal  and  noise  strengths,  rather  than  individually  on  the  signal  and  noise  strengths.  We  now 
define  some  standard  measures  for  these  quantities,  and  then  express  the  performance  of  some 
common  binary  signaling  schemes  in  terms  of  them. 

Energy  per  bit,  Eb:  For  binary  signaling,  this  is  given  by 

Eb  =  2 (I lso| |2  +  IN  2) 

assuming  that  0  and  1  are  equally  likely  to  be  sent. 

Scale-invariant  parameters:  If  we  scale  up  both  si  and  .sy  by  a  factor  A,  E/,  scales  up  by  a 
factor  A2,  while  the  distance  d  scales  up  by  a  factor  A.  We  can  therefore  define  the  scale-invariant 
parameter 

d2 

riP=Tr  (6.39) 

Now,  substituting,  d  =  \frfpEb  and  a  =  \J N0/ 2  into  (6.34),  we  obtain  that  the  ML  performance 
is  given  by 

p'“  =  «(VHr)=«( 

Two  important  observations  follow. 

Performance  depends  on  signal-to-noise  ratio:  We  observe  from  (6.40)  that  the  perfor¬ 
mance  depends  on  the  ratio  E^/Nq,  rather  than  separately  on  the  signal  and  noise  strengths. 

Power  efficiency:  For  fixed  Eb/N> o,  the  performance  is  better  for  a  signaling  scheme  that  has  a 
higher  value  of  rjp.  We  therefore  use  the  term  power  efficiency  for  r]p  =  Jr. 

Let  us  now  compute  the  performance  of  some  common  binary  signaling  schemes  in  terms  of 
Eb/N0,  using  (6.40).  Since  inner  products  (and  hence  energies  and  distances)  are  preserved  in 
signal  space,  we  can  compute  rjp  for  each  scheme  using  the  signal  space  representations  depicted 
in  Figure  6.19.  The  absolute  scale  of  the  signals  is  irrelevant,  since  the  performance  depends  on 
the  signaling  scheme  only  through  the  scale-invariant  parameter  rjp  =  d2 / Eb.  We  can  therefore 
choose  any  convenient  scaling  for  the  signal  space  representation  for  a  modulation  scheme. 


o 


i 


%■ 


o 


i 


0 % 
0  1 


On-off  keying 


Antipodal  signaling  Equal  energy,  orthogonal  signaling 


Figure  6.19:  Signal  space  representations  with  conveniently  chosen  scaling  for  three  binary  sig¬ 
naling  schemes. 


On-off  keying:  Here  Si(t)  =  s(t )  and  s0(t)  =  0.  As  shown  in  Figure  6.19,  the  signal  space  is 
one-dimensional.  For  the  scaling  in  the  figure,  we  have  d  =  1  and  Eb  =  |(12  +  02)  =  |,  so  that 


rjp  —  Jr  =  2.  Substituting  into  (6.40),  we  obtain  Pe,ML  =  Q 
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Antipodal  signaling:  Here  si(t)  =  —  so(£),  leading  again  to  a  one-dimensional  signal  space 
representation.  One  possible  realization  of  antipodal  signaling  is  BPSK,  discussed  in  the  previous 
chapter.  For  the  scaling  chosen,  d  =  2  and  E b  =  |(12  +  (— l)2)  =  1,  which  gives  rjp  =  =  4. 


Substituting  into  (6.40),  we  obtain  PejML  = 


Equal-energy,  orthogonal  signaling:  Here  si  and  So  are  orthogonal,  with  1 1 s 1 1 1 2  =  ||so||2- 
This  is  a  two-dimensional  signal  space.  As  discussed  in  the  previous  chapter,  possible  realizations 
of  orthogonal  signaling  include  FSK  and  Walsh- Hadamard  codes.  From  Figure  6.19,  we  have 

d  =  y/2  and  Eb  =  1,  so  that  r]P  =  =  2.  This  gives  Pe,ML  —  Q  (/I) • 


Thus,  on-off  keying  (which  is  orthogonal  signaling  with  unequal  energies)  and  equal-energy  or¬ 
thogonal  signaling  have  the  same  power  efficiency,  while  the  power  efficiency  of  antipodal  signaling 
is  a  factor  of  two  (i.e.,  3  dB)  better. 

In  plots  of  error  probability  versus  SNR,  we  typically  express  error  probability  on  a  log  scale  (in 
order  to  capture  its  rapid  decay  with  SNR)  and  to  express  SNR  in  decibels  (in  order  to  span  a 
large  range).  We  provide  such  a  plot  for  antipodal  and  orthogonal  signaling  in  Figure  6.20. 


Figure  6.20:  Error  probability  versus  Eb/N0  (dB)  for  binary  antipodal  and  orthogonal  signaling 
schemes. 


6.3.3  M- ary  signaling:  scale-invariance  and  SNR 

We  turn  now  to  ilf-ary  signaling  with  M  >  2,  modeled  as  the  following  hypothesis  testing 
problem. 

Hi  :  y(t )  =  Si(t)  +  n(t),  i  =  0,1, ...,  M  -  1 
for  which  the  ML  rule  has  been  derived  to  be 

dML{y)  =  arg  max0<i<M_1  Zt 

with  decision  statistics 

Zi  =  {y,Si)  -  -|NJ  ,  i  =  0,1, ...,  M  —  1 

and  corresponding  decision  regions 

r i  =  {y  :  Zi  >  Z3  for  all  j  ^  i}  ,  i  —  0, 1, ...,  M  —  1  (6.41) 
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Before  doing  detailed  computations,  let  us  discuss  some  general  properties  that  greatly  simplify 
the  framework  for  performance  analysis. 

Scale  Invariance:  For  binary  signaling,  we  have  observed  through  explicit  computation  of 
the  error  probability  that  performance  depends  only  on  signal-to-noise  ratio  {Eb/Nf)  and  the 
geometry  of  the  signal  set  (which  determines  the  power  efficiency  d2 / Eb).  Actually,  we  can  make 
such  statements  in  great  generality  for  M- ary  signaling  without  explicit  computations.  First,  let 
us  note  that  the  performance  of  an  optimal  receiver  does  not  change  if  we  scale  both  signal  and 
noise  by  the  same  factor.  Specifically,  optimal  reception  for  the  model 

Hi  :  y(t)  =  Asi(t)  +  An(t),  i  —  0, 1, ...,  M  —  1  (6.42) 

does  not  depend  on  A.  This  is  inferred  from  the  following  general  observation:  the  performance 
of  an  optimal  receiver  is  unchanged  when  we  pass  the  observation  through  an  invertible  transfor¬ 
mation.  Specifically,  suppose  z{t)  =  F(y(t ))  is  obtained  by  passing  y{t)  through  an  invertible 
transformation  F.  If  the  optimal  receiver  for  z  does  better  than  the  optimal  receiver  for  y,  then 
we  could  apply  F  to  y  to  get  z,  then  do  optimal  reception  for  z.  This  would  perform  better 
than  the  optimal  receiver  for  y,  which  is  a  contradiction.  Similarly,  if  the  optimal  receiver  for  y 
does  better  than  the  optimal  receiver  for  z,  then  we  could  apply  F ~l  to  z  to  get  y,  and  then  do 
optimal  reception  for  y  to  perform  better  than  the  optimal  receiver  for  z,  again  a  contradiction, 
if  the  optimal  receiver  for  y  does  better  than  the  optimal  receiver  for  f(y). 

The  preceding  argument  implies  that  performance  depends  only  on  the  signal-to-noise  ratio, 
once  we  have  fixed  the  signal  constellation.  Let  us  now  figure  out  what  properties  of  the  signal 
constellation  are  relevant  in  determining  performance  For  M  —  2,  we  have  seen  that  all  that 
matters  is  the  scale-invariant  quantity  d2 / Eb-  What  are  the  analogous  quantities  for  M  >  2?  To 
determine  these,  let  us  consider  the  conditional  error  probabilities  for  the  ML  rule. 

Conditional  error  probability:  The  conditional  error  probability,  conditioned  on  Hi,  is  given 
by 

Pe\i  =  P[y  f  Tj|i  sent]  =  P[Zi  <  Zj  for  some  j  ^  i\i  sent]  (6.43) 

While  computation  of  the  conditional  error  probability  in  closed  form  is  typically  not  feasible, 
we  can  actually  get  significant  insight  on  what  parameters  it  depends  on  by  examining  the 
conditional  distributions  of  the  decision  statistics.  Since  y  =  s*  +  n  conditioned  on  H,,  the 
decision  statistics  are  given  by 

Zj  =  ( y,sj )  -  ||si||2/2  =  ( Si  +  n,Sj )  -  ||sj||2/2  =  ( n,Sj )  +  {s^sj)  -  ||sj||2/2  ,  0  <  j  <  M  -  1 

By  the  Gaussianity  of  n(t),  the  decision  statistics  {Zj}  are  jointly  Gaussian  (conditioned  on  Hi). 
Their  joint  distribution  is  therefore  completely  characterized  by  their  means  and  covariances. 
Since  the  noise  is  zero  mean,  we  obtain 


E[Zj\Hi]  =  ( Si,Sj ) 

Using  Theorem  6.2.1,  and  noting  that  covariance  is  unaffected  by  translation,  we  obtain  that 

co v(Zj,  Zk\Hi)  =  cov  ((n,  sj),  ( n ,  sk))  =  (J2{sj,  sk) 

Thus,  conditioned  on  H,,  the  joint  distribution  of  {Zj}  depends  only  on  the  noise  variance  cr2 
and  the  signal  inner  products  {(sj,  Sj),  1  <  i,j  <  M}.  Now  that  we  know  the  joint  distribution, 
we  can  in  principle  compute  the  conditional  error  probabilities  Peu.  In  practice,  this  is  often 
difficult,  and  we  often  resort  to  Monte  Carlo  simulations.  However,  what  we  have  found  out 
about  the  joint  distribution  can  now  be  used  to  refine  our  concepts  of  scale-invariance. 

Performance  only  depends  on  normalized  inner  products:  Let  us  replace  Zj  by  Zjjo2. 
Clearly,  since  we  are  simply  picking  the  maximum  among  the  decision  statistics,  scaling  by  a 
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common  factor  does  not  change  the  decision  (and  hence  the  performance), 
obtain  that 


( Sii  Sj ) 


a 


2 


and 


cov 


Z,,  Zi. 


1 7 *  crz 


-4,^1  Hi)  —  —rCOv(Zj,  Zk\Hi)  — 


(Sj , Sfc) 


a 


cr^ 


However,  we  now 


Thus,  the  joint  distribution  of  the  normalized  decision  statistics  {Zj/a2},  conditioned  on  any 

of  the  hypotheses,  depends  only  on  the  normalized  inner  products  { ,  \  <  i,j  <  M}.  Of 
course,  this  means  that  the  performance  also  depends  only  on  these  normalized  inner  products. 

Let  us  now  carry  these  arguments  further,  still  without  any  explicit  computations.  We  define 
energy  per  symbol  and  energy  per  bit  for  M- ary  signaling  as  follows. 


Energy  per  symbol,  Es :  For  M- ary  signaling  with  equal  priors,  the  energy  per  symbol  Es  is 
given  by 


i=  1 


Energy  per  bit,  E b:  Since  M-ary  signaling  conveys  log2  M  bits/symbol,  the  energy  per  bit  is 
given  by 


Eb 


Es 

l°g  2M 


If  all  signals  in  a  M-ary  constellation  are  scaled  up  by  a  factor  A,  then  Es  and  Eb  get  scaled 
up  by  A2,  as  do  all  inner  products  {(sj,Sj)}-  Thus,  we  can  define  scale- invariant  inner  products 
{  wjqc]:i  depend  only  on  the  shape  of  the  signal  constellation.  Indeed,  we  can  define  the 

shape  of  a  constellation  as  these  scale- invariant  inner  products.  Setting  a2  =  N0/2,  we  can  now 
write  the  normalized  inner  products  determining  performance  as  follows: 


(gn  sj)  _  (gn  sj)  2-E), 

a2  "  Eb  N0 


(6.44) 


We  can  now  make  the  following  statement. 

Performance  depends  only  on  Eb/N0  and  constellation  shape  (as  specified  by  the 
scale-invariant  inner  products):  We  have  shown  that  the  performance  depends  only  on  the 

normalized  inner  products  {  }.  From  (6.44),  we  see  that  these  in  turn  depend  only  on  Eb/No 

and  the  scale-invariant  inner  products  The  latter  depend  only  on  the  shape  of  the 

signal  constellation,  and  are  completely  independent  of  the  signal  and  noise  strengths.  What 
this  means  is  that  we  can  choose  any  convenient  scaling  that  we  want  for  the  signal  constellation 
when  investigating  its  performance,  as  long  as  we  keep  track  of  the  signal-to-noise  ratio.  We 
illustrate  this  via  an  example  where  we  determine  the  error  probability  by  simulation. 


Example  6.3.1  (Using  scale-invariance  in  error  probability  simulations):  Suppose  that 
we  wish  to  estimate  the  error  probability  for  8PSK  by  simulation.  The  signal  points  lie  in  a  2- 
dimensional  space,  and  we  can  scale  them  to  lie  on  a  circle  of  unit  radius,  so  that  the  constellation 
is  given  by  A  =  {(cos  9,  sin  9)T  :  9  =  k7r/4:,k  =  0,1,. ..,7}.  The  energy  per  symbol  Es  —  1  for 
this  scaling,  so  that  Eb  =  Es/ log2  8  =  1/3.  We  therefore  have  Eb/N0  =  1/(3N0)  =  1/(6<t2),  so 
that  the  noise  variance  per  dimension  can  be  set  to 


6(Eb/N0) 
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Typically,  Eb/N0  is  specified  in  dB,  so  we  need  to  convert  it  to  the  “raw”  Eb/N0.  We  now  have 
a  simulation  consisting  of  the  following  steps,  repeated  over  multiple  symbol  transmissions: 

Step  1:  Choose  a  symbol  s  at  random  from  A.  For  this  symmetric  constellation,  we  can  actually 
keep  sending  the  same  symbol  in  order  to  compute  the  performance  of  the  ML  rule,  since  the 
conditional  error  probabilities  are  all  equal.  For  example,  set  s  =  (1,  0)T. 

Step  2:  Generate  two  i.i.d.  A(0, 1)  random  variables  Uc  and  Us.  The  1  and  Q  noises  can  now  be 
set  as  Nc  =  aUc  and  Ns  =  crUs,  so  that  N  =  (Nc,  NS)T . 

Step  3:  Set  the  received  vector  y  =  s  +  N. 

Step  4-'  Compute  the  ML  decision  arg  max^  (y,  s*)  (the  energy  terms  can  be  dropped,  since  the 
signals  are  of  equal  energy)  or  arg  miry  ||y  —  Sj||2. 

Step  5:  If  there  is  an  error,  increment  the  error  count. 

The  error  probability  is  estimated  as  the  error  count,  divided  by  the  number  of  symbols  trans¬ 
mitted.  We  repeat  this  simulation  over  a  range  of  Eb/N0,  and  typically  plot  the  error  probability 
on  a  log  scale  versus  Eb/N0  in  dB. 

These  steps  are  carried  out  in  the  following  code  fragment,  which  generates  Figure  6.21  comparing 
a  simulation-based  estimate  of  the  error  probability  for  8PSK  against  the  intelligent  union  bound, 
an  analytical  estimate  that  we  develop  shortly.  The  analytical  estimate  requires  very  little 
computation  (evaluation  of  a  single  Q  function),  but  its  agreement  with  simulations  is  excellent. 
As  we  shall  see,  developing  such  analytical  estimates  also  gives  us  insight  into  how  errors  are 
most  likely  to  occur  for  M- ary  signaling  in  AWGN. 

The  code  fragment  is  written  for  transparency  rather  than  computational  efficiency.  The  code 
contains  an  outer  for-loop  for  varying  SNR,  and  an  inner  for -loop  for  computing  minimum  dis¬ 
tances  for  the  symbols  sent  at  each  SNR.  The  inner  loop  can  be  avoided  and  the  program  sped  up 
considerably  by  computing  all  minimum  distances  for  all  symbols  at  once  using  matrix  operations 
(try  it!).  We  use  a  less  efficient  program  here  to  make  the  operations  easy  to  understand. 


Figure  6.21:  Error  probability  for  8PSK. 


Code  Fragment  6.3.1  (Simulation  of  8PSK  performance  in  AWGN) 

°/0generate  8PSK  constellation  as  complex  numbers 
a=cumsum(ones (8 , 1 ) ) — 1 ; 
constellation  =  exp(i*2*pi . *a/8) ; 

°/0number  of  symbols  in  simulation 
nsymbols  =  20000; 
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ebnodb  =  0:0.1:10; 
number_snrs  =  length (ebnodb) ; 
perr_estimate  =  zeros (number_snrs , 1) ; 
for  k=l : nuraber_snrs ,  “/SNR  for  loop 
ebnodb_now  =  ebnodb (k); 
ebno=10~ (ebnodb_now/10) ; 
sigma=sqrt(l/(6*ebno)) ; 

°/„send  first  symbol  without  loss  of  generality,  add  2d  Gaussian  noise 
received  =  1  +  sigma*randn(nsymbols , l)+j *sigma*randn(nsymbols , 1) ; 
decisions=zeros (nsymbols , 1) ; 

for  n=l : nsymbols ,  “/Symbol  for  loop  (can/should  be  avoided  for  fast  implementation) 
distances  =  abs(received(n)-constellation) ; 

[min_dist , decisions (n)]  =  min(distances) ; 

end 

errors  =  (decisions  ~=  1) ; 
perr_estimate(k)  =  sum(errors) /nsymbols ; 
end 

semilogy (ebnodb, perr_estimate) ; 
hold  on; 

“/COMPARE  WITH  INTELLIGENT  UNION  BOUND 
etaP  =  6-3*sqrt(2);  “/power  efficiency 
Ndmin  =  2;“/  number  of  nearest  neighbors 
ebno  =  10 . ~ (ebnodb/10) ; 

perr_union  =  Ndmin*q_f unction (sqrt (etaP*ebno/2) ) ; 
semilogy (ebnodb, perr_union, ’ :r“) ; 
xlabel( “Eb/N0  (dB)“); 
ylabel( ’Symbol  error  probability “) ; 

legend( ’ Simulation’ , “Intelligent  Union  Bound’ , “Location’ , ’NorthEast’) ; 


6.3.4  Performance  analysis  for  M- ary  signaling 

We  begin  by  computing  the  error  probability  for  QPSK,  for  which  we  can  get  simple  expressions 
for  the  error  probability  in  terms  of  the  Q  function.  We  then  discuss  why  exact  performance  anal¬ 
ysis  can  be  more  complicated  in  general,  motivating  the  need  for  the  bounds  and  approximations 
we  develop  in  this  section. 
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Figure  6.22:  If  ,s0  is  sent,  an  error  occurs  if  Nc  or  Ns  is  negative  enough  to  make  the  received 
vector  fall  out  of  the  first  quadrant. 


Exact  analysis  for  QPSK:  Let  us  find  Pe |0,  the  conditional  error  probability  for  the  ML  rule 
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conditioned  on  sq  being  sent.  For  the  scaling  shown  in  Figure  6.22, 


and  the  two-dimensional  received  vector  is  given  by 


y  —  s o  +  (Nc,  Ns)t 


l  +  Nc\ 

I  +  ns  ) 


where  Nc,  Ns  are  i.i.d.  N( 0,  cr2)  random  variables,  corresponding  to  the  projections  of  WGN 
along  the  I  and  Q  axes,  respectively.  An  error  occurs  if  the  noise  moves  the  observation  out 
of  the  positive  quadrant,  which  is  the  decision  region  for  so-  This  happens  if  Nc  +  |  <  0  or 
Ns  +  |  <  0.  We  can  therefore  write 


Pe\ 0  =  P[Nc+C\  <  o  or  Ns+ ±  <  0]  =  P[Nc+C±  <  0 }+P[Na+^  <  0]-P[iVc+^  <  0  and  Ns+ d-  <  0] 

(6.45) 

But 

This  is  also  equal  to  P[NS  +  ^  <  0],  since  Nc,  Ns  are  identically  distributed.  Furthermore,  since 
Nc,  Ns  are  independent,  we  have 


P[NC  +  d-  <  0  and  Ns  +  ^  <  0]  =  P[NC  +  ^  <  0]P[iVs  +  ^  <  0] 


Q 


2 


Substituting  these  expressions  into  (6.45),  we  obtain  that 


P 


e|l 


(6.46) 


By  symmetry,  the  conditional  probabilities  Pe\i  are  equal  for  all  i,  which  implies  that  the  average 
error  probability  is  also  given  by  the  expression  above.  We  now  express  the  error  probability  in 
terms  of  the  scale-invariant  parameter  and  Eb/N0:  using  the  relation 


d  _  rw 

2^  “  y  ~Eb\l  2iVo 

The  energy  per  symbol  is  given  by 


which  implies  that  the  energy  per  bit  is 

E  Es  Es  d2 
b  log2  M  log 2  4  4 


This  yields  =  4,  and  hence  ^ 


Substituting  into  (6.46),  we  obtain 


Pe  =  Pe  |i  =  2  Q 


(6.47) 
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Figure  6.23:  The  noise  random  variables  A) ,  N2,  N3  which  can  drive  the  received  vector  outside 
the  decision  region  T0  are  correlated,  which  makes  it  difficult  to  find  an  exact  expression  for  Pe |0 


as  the  exact  error  probability  for  QPSK. 

Why  exact  analysis  can  be  difficult:  Let  us  first  understand  why  we  could  find  a  simple 
expression  for  the  error  probability  for  QPSK.  The  decision  regions  are  bounded  by  the  I  and 
Q  axes.  The  noise  random  variable  Nc  can  cause  crossing  of  the  Q  axis,  while  Ns  can  cause 
crossing  of  the  I  axis.  Since  these  two  random  variables  are  independent,  the  probability  that  at 
least  one  of  these  noise  random  variables  causes  a  boundary  crossing  becomes  easy  to  compute. 
Figure  6.23  shows  an  example  where  this  is  not  possible.  In  the  figure,  we  see  that  the  decision 
region  To  is  bounded  by  three  lines  (in  general,  these  would  be  n  —  1-dimensional  hyperplanes 
in  n- dimensional  signal  space).  An  error  occurs  if  we  cross  any  of  these  lines,  starting  from  So- 
In  order  to  cross  the  line  between  s0  and  s*,  the  noise  random  variable  Nt  must  be  bigger  than 
1 1 Sj  —  So|  |/2,  i  =  1,2,3  (as  we  saw  in  Figures  6.17  and  6.18,  only  the  noise  component  orthogonal 
to  a  hyperplane  determines  whether  we  cross  it).  Thus,  the  conditional  error  probability  can  be 
written  as 


Pe\0  =  P[N i  >  ||si  -  s0| |/2  or  N2  >  ||s2  -  s0||/2  or  N3  >  ||s3  -  s0||/2]  (6.48) 

The  random  variables  N±,N2,N3  are,  of  course,  jointly  Gaussian,  since  each  is  a  projection 
of  WGN  along  a  direction.  Each  of  them  is  an  1V(0,<t2)  random  variable;  that  is,  they  are 
identically  distributed.  However,  they  are  not  independent,  since  they  are  projections  of  WGN 
along  directions  that  are  not  orthogonal  to  each  other.  Thus,  we  cannot  break  down  the  preceding 
expression  into  probabilities  in  terms  of  the  individual  random  variables  N\ ,  N2l  N3,  unlike  what 
we  did  for  QPSK  (where  NC,NS  were  independent).  However,  we  can  still  find  a  simple  upper 
bound  on  the  conditional  error  probability  using  the  union  bound,  as  follows. 

Union  Bound:  The  probability  of  a  union  of  events  is  upper  bounded  by  the  sum  of  the 
probabilities  of  the  events. 

P[A\  or  A2  or  ...  or  An]  =  P  [Ai  U  A2...  U  An]  <  .P[Ai]  +  -P[A.2]  +  ...  +  P[An]  (6.49) 


Applying  (6.49)  to  (6.48),  we  obtain  that,  for  the  scenario  depicted  in  Figure  6.23,  the  conditional 
error  probability  can  be  upper  bounded  as  follows: 


Pe |0  <  P[A'i  >  |  |si  —  Sol  1/2]  +  P[N2  > 
=  Q  +Q  /|L"' ■ 


b2-so| 

2  a 


+  Q 


1^2  —  so|  |/2]  +  P[N3  >  1 1 s3  —  so|  |/2] 

b3-so| 


2  a 


(6.50) 
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Thus,  the  conditional  error  probability  is  upper  bounded  by  a  sum  of  probabilities,  each  of  which 
corresponds  to  the  error  probability  for  a  binary  decision:  s0  versus  Si ,  s o  versus  S2,  and  s 0  versus 
S3.  This  approach  applies  in  great  generality,  as  we  show  next. 

Union  Bound  and  variants:  Pictures  such  as  the  one  in  Figure  6.23  typically  cannot  be 
drawn  when  the  signal  space  dimension  is  high.  However,  we  can  still  find  union  bounds  on  error 
probabilities,  as  long  as  we  can  enumerate  all  the  signals  in  the  constellation.  To  do  this,  let  us 
rewrite  (6.43),  the  conditional  error  probability,  conditioned  on  Hi,  as  a  union  of  M  —  1  events 
as  follows: 

Pe\i  =  P[Uj^i{Zi  <  Zj}\i  sent]] 

where  {Zj}  are  the  decision  statistics.  Using  the  union  bound  (6.49),  we  obtain 

Pe\i  <  <  Zj\i  sent]]  (6.51) 

jp 

But  the  jth  term  on  the  right-hand  side  above  is  simply  the  error  probability  of  ML  reception 
for  binary  hypothesis  testing  between  the  signals  s,  and  Sj.  From  the  results  of  Section  6.3.2,  we 
therefore  obtain  the  following  pairwise  error  probability: 

P[Zi  <  Zj\i  sent]]  =  Q 

\  2(7 


Substituting  into  (6.51),  we  obtain  upper  bounds  on  the  conditional  error  probabilities  and  the 
average  error  probability  as  follows. 


Union  Bound  on  conditional  error  probabilities:  The  conditional  error  proba¬ 
bilities  for  the  ML  rule  are  bounded  as 


Pe\i<Y,Q 

j¥=i 


2a 


Y.Qld,i 


2a 


(6.52) 


where  dij  =  1 1 st  —  s3 |  is  the  distance  between  signals  Si  and  Sj. 

Union  bound  on  average  error  probability:  Averaging  the  conditional  error  using 
the  prior  probabilities  gives  an  upper  bound  on  the  average  error  probability  as  follows: 


Pe  ^  ^  T^iPe\i  Pi  ^  ^  V,  ^  ^  Q 
i  i  j^i 


I  Sj  s%  | 
2(7 


i  j^i 


dij 
2  a 


(6.53) 


We  can  now  rewrite  the  union  bound  in  terms  of  Eb/N0  and  the  scale-invariant  squared 


d •?. 

distances  -p  as  follows: 


Pe\i<Y,Q 

jp 


'd%  \~Eh 


Eb  V  2Aq 


Pe  =  ^7T4Pe|i<^7Ti^g  (  ^ 

jp 


Eb  V  2Aq 


(6.54) 


(6.55) 


Applying  the  union  bound  to  Figure  6.23,  we  obtain 


2  a 


Pe  10  <  Q  (  l|5\_g°h  )  +  Q  (  l|S2o_SQ|1  )  +  Q  (  )+Q(  l|S4  So1 


2u 


2  a 


2  a 
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Notice  that  this  answer  is  different  from  the  one  we  had  in  (6.50).  This  is  because  the  fourth 
term  corresponds  to  the  signal  S4,  which  is  “too  far  away”  from  So  to  play  a  role  in  determining 
the  decision  region  T0.  Thus,  when  we  do  have  a  more  detailed  geometric  understanding  of  the 
decision  regions,  we  can  do  better  than  the  generic  union  bound  (6.52)  and  get  a  tighter  bound, 
as  in  (6.50).  We  term  this  the  intelligent  union  bound,  and  give  a  general  formulation  in  the 
following. 

Denote  by  Nmi(i )  the  indices  of  the  set  of  neighbors  of  signal  s*  (we  exclude  i  from  Nmi{i )  by 
definition)  that  characterize  the  ML  decision  region  Tj.  That  is,  the  half-planes  that  we  intersect 
to  obtain  Tj  correspond  to  the  perpendicular  bisectors  of  lines  joining  s*  and  Sj,  j  G  Nmi(i).  For 
example,  in  Figure  6.23,  Nmi( 0)  =  {1,2,3};  S4  is  excluded  from  this  set,  since  it  does  not  play  a 
role  in  determining  r0.  The  decision  region  in  (6.41)  can  now  be  expressed  as 

U  =  {y  :  Sml{v)  =  i}  =  {y  ■■  Zi>  Z.J  for  all  j  G  Nml{i)}  (6.56) 

We  can  now  say  the  following:  y  falls  outside  Tj  if  and  only  if  Zi  <  z  i  for  some  j  G  Nmi(i).  We 
can  therefore  write 


Pe\i  =  P[y  ^  sent]  =  P[Zi  <  Z3  for  some  j  &  Nmi(i)\i  sent]  (6.57) 


and  from  there,  following  the  same  steps  as  in  the  union  bound,  get  a  tighter  bound,  which  we 
express  as  follows. 

Intelligent  Union  Bound:  A  better  bound  on  Pe\i  is  obtained  by  considering  only 
the  neighbors  of  s*  that  determine  its  ML  decision  region,  as  follows: 

p,  K<  V  q(M_M)  (6.58) 

j  G  Nmi  ( i ) 

In  terms  of  Eb/N0,  we  get 


Pe\i  <  y ^  Q 

j  G  Nmi  (i) 

(the  bound  on  the  average  error  probability  Pe  is  computed  as  before  by  averaging  the 
bounds  on  Pe\i  using  the  priors). 

Union  Bound  for  QPSK:  For  QPSK,  we  infer  from  Figure  6.22  that  the  union  bound  for  Pe\\ 
is  given  by 


(6.59) 


Pe  =  Pe\0  <  Q  (  ^ 


Q(^)+Q(^)=2Q(7f  )+Q 


2a 


03 


2a 


d  \ 


2a  J 


V2d' 

~2, o~ 


Using  =  4,  we  obtain  the  union  bound  in  terms  of  Eb/N0  to  be 


Pe  <  2  Q 


QPSK  union  bound 


(6.60) 


For  moderately  large  Eb/N0,  the  dominant  term  in  terms  of  the  decay  of  the  error  probability  is 
the  first  one,  since  Q(x)  falls  off  rapidly  as  x  gets  large.  Thus,  while  the  union  bound  (6.60)  is 
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larger  than  the  exact  error  probability  (6.47),  as  it  must  be,  it  gets  the  multiplicity  and  argument 
of  the  dominant  term  right.  Tightening  the  analysis  using  the  intelligent  union  bound,  we  get 


Pe |0  <  Q 


QPSK  intelligent  union  bound  (6.61) 


since  Nmi( 0)  =  {1,2}  (the  decision  region  for  s0  is  determined  by  the  neighbors  si  and  S2). 

Another  common  approach  for  getting  a  better  (and  quicker  to  compute)  estimate  than  the 
original  union  bound  is  the  nearest  neighbors  approximation.  This  is  a  loose  term  employed  to 
describe  a  number  of  different  methods  for  pruning  the  terms  in  the  summation  (6.52).  Most 
commonly,  it  refers  to  regular  signal  sets  in  which  each  signal  point  has  a  number  of  nearest 
neighbors  at  distance  dmin  from  it,  where  dmin  =  miri^j 1 1 st  —  Sj \ \ .  Letting  Ndmin(i )  denote  the 
number  of  nearest  neighbors  of  s$,  we  obtain  the  following  approximation. 

Nearest  Neighbors  Approximation 

Pe  |i«AriTO,«Q(AA)  (6.62) 

Averaging  over  i,  we  obtain  that 

P. ,  «  NdininQ  (6.63) 

where  Ndmin  denotes  the  average  number  of  nearest  neighbors  for  a  signal  point.  The  rationale 
for  the  nearest  neighbors  approximation  is  that,  since  Q(x)  decays  rapidly,  Q(x)  ~  e  x2P,  as 
x  gets  large,  the  terms  in  the  union  bound  corresponding  to  the  smallest  arguments  for  the  Q 
function  dominate  at  high  SNR. 

The  corresponding  formulas  as  a  function  of  scale-invariant  quantities  and  E^/Nq  are: 


Pe\i 


~Ndmin(i)Q 


(6.64) 


It  is  also  worth  explicitly  writing  down  an  expression  for  the  average  error  probability,  averaging 
the  preceding  over  i: 


Pe  ~  Nd 


(6.65) 


where 


N, 


1 

M 


M 

yNd .  u) 

/  j  Uirmn  \  / 


i= 1 

is  the  average  number  of  nearest  neighbors  for  the  signal  points  in  the  constellation. 
For  QPSK,  we  have  from  Figure  6.22  that 


Nd  .  (i)=  2  =  Nd 


and 


4 
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yielding 


Pe  «  2  Q 


In  this  case,  the  nearest  neighbors  approximation  coincides  with  the  intelligent  union  bound 
(6.61).  This  happens  because  the  ML  decision  region  for  each  signal  point  is  determined  by  its 
nearest  neighbors  for  QPSK.  Indeed,  the  latter  property  holds  for  many  regular  constellations, 
including  all  of  the  PSK  and  QAM  constellations  whose  ML  decision  regions  are  depicted  in 
Figure  6.16. 

Power  Efficiency:  While  exact  performance  analysis  for  M-ary  signaling  can  be  computation¬ 
ally  demanding,  we  have  now  obtained  simple  enough  estimates  that  we  can  define  concepts  such 
as  power  efficiency,  analogous  to  the  development  for  binary  signaling.  In  particular,  comparing 
the  nearest  neighbors  approximation  (6.63)  with  the  error  probability  for  binary  signaling  (6.40), 
we  define  in  analogy  the  power  efficiency  of  an  M- ary  signaling  scheme  as 


Vp 


hmin 

aT 


We  can  rewrite  the  nearest  neighbors  approximation  as 


Pe  «  Nd  Q 

°  u"min  w/ 


(6.66) 


(6.67) 


Since  the  argument  of  the  Q  function  in  (6.67)  plays  a  bigger  role  than  the  multiplicity  Ndrnin  for 
moderately  large  SNR,  T]p  offers  a  means  of  quickly  comparing  the  power  efficiency  of  different 
signaling  constellations,  as  well  as  for  determining  the  dependence  of  performance  on  E^/Nq. 
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Figure  6.24:  ML  decision  regions  for  16QAM  with  scaling  chosen  for  convenience  in  computing 
power  efficiency. 


Performance  analysis  for  16QAM:  We  now  apply  the  preceding  performance  analysis  to  the 
16QAM  constellation  depicted  in  Figure  6.24,  where  we  have  chosen  a  convenient  scale  for  the 
constellation.  We  now  compute  the  nearest  neighbors  approximation,  which  coincides  with  the 
intelligent  union  bound,  since  the  ML  decision  regions  are  determined  by  the  nearest  neighbors. 
Noting  that  the  number  of  nearest  neighbors  is  four  for  the  four  innermost  signal  points,  two  for 
the  four  outermost  signal  points,  and  three  for  the  remaining  eight  signal  points,  we  obtain  upon 
averaging 

Nd  .  =3  (6.68) 

It  remains  to  compute  the  power  efficiency  r]p  and  apply  (6.67).  We  had  done  this  in  the  preview 
in  Chapter  4,  but  we  repeat  it  here.  For  the  scaling  shown,  we  have  dmin  =  2.  The  energy  per 
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symbol  is  obtained  as  follows: 


Es  =  average  energy  of  I  component  +  average  energy  of  Q  component 
=  2 (average  energy  of  I  component) 


by  symmetry.  Since  the  I  component  is  equally  likely  to  take  the  four  values  ±1  and  ±3,  we  have 


average  energy  of  I  component 


5 


and 

We  therefore  obtain 


Eg  =  10 


Eh  = 


10 


5 


log2  M  log 2  16  2 

The  power  efficiency  is  therefore  given  by 


Vp  = 


d2  ■ 

mir 


Substituting  (6.68)  and  (6.69)  into  (6.67),  we  obtain  that 

Pe(16QAM)  «  3 Q 


4  Eh 


5  Nn 


(6.69) 


(6.70) 


as  the  nearest  neighbors  approximation  and  intelligent  union  bound  for  16QAM.  The  bandwidth 
efficiency  for  16QAM  is  4  bits/2  dimensions,  which  is  twice  that  of  QPSK,  whose  bandwidth 
efficiency  is  2  bits/2  dimensions.  It  is  not  surprising,  therefore,  that  the  power  efficiency  of 
16QAM  (■ rjp  =  1.6)  is  smaller  than  that  of  QPSK  ( r/p  =  4).  We  often  encounter  such  tradeoffs 
between  power  and  bandwidth  efficiency  in  the  design  of  communication  systems,  including  when 
the  signaling  waveforms  considered  are  sophisticated  codes  that  are  constructed  from  multiple 
symbols  drawn  from  constellations  such  as  PSK  and  QAM. 


Figure  6.25:  Symbol  error  probabilities  for  QPSK  and  16QAM. 


Figure  6.25  shows  the  symbol  error  probabilities  for  QPSK  and  16QAM,  comparing  the  intelligent 
union  bounds  (which  coincide  with  nearest  neighbors  approximations)  with  exact  results.  The 
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exact  computations  for  16QAM  use  the  closed  form  expression  (6.70)  derived  in  Problem  6.21.  We 
see  that  the  exact  error  probability  and  intelligent  union  bound  are  virtually  indistinguishable. 
The  power  efficiencies  of  the  constellations  (which  depend  on  the  argument  of  the  Q  function) 

accurately  predict  the  distance  between  the  curves:  =  fyp  which  equals  about  4  dB. 

From  Figure  6.25,  we  see  that  the  distance  between  the  QPSK  and  16QAM  curves  at  small  error 
probabilities  (high  SNR)  is  indeed  about  4  dB. 


Decision  boundary 


Figure  6.26:  Performance  analysis  for  BPSK  with  phase  offset. 


The  performance  analysis  techniques  developed  here  can  also  be  applied  to  suboptimal  receivers. 
Suppose,  for  example,  that  the  receiver  LO  in  a  BPSK  system  is  offset  from  the  incoming  carrier 
by  a  phase  shift  d,  but  that  the  receiver  uses  decision  regions  corresponding  to  no  phase  offset. 
The  signal  space  picture  is  now  as  in  Figure  6.26.  The  error  probability  is  now  given  by 


Pe  ~  Pe |0  —  Pe\l  ~  Q 


Q 


(  \P2  2 Eb\ 
\V  Eb  N0  ) 


For  the  scaling  shown,  D  =  cos  d  and  Eb  =  1,  which  gives 


2  Eb  cos2  d\ 

Ao  J 

so  that  there  is  a  loss  of  10  log10  cos2  d  dB  in  performance  due  to  the  phase  offset  (e.g.  d  =  10° 
leads  to  a  loss  of  0.13  dB,  while  d  =  30°  leads  to  a  loss  of  1.25  dB). 

6.3.5  Performance  analysis  for  M- ary  orthogonal  modulation 

So  far,  our  examples  have  focused  on  two-dimensional  modulation,  which  is  what  we  use  when 
our  primary  concern  is  bandwidth  efficiency.  We  now  turn  our  attention  to  equal  energy,  M- 
ary  orthogonal  signaling,  which,  as  we  have  mentioned  before,  lies  at  the  other  extreme  of  the 
power-bandwidth  tradeoff  space:  as  M  — >  oo ,  the  power  efficiency  reaches  the  highest  possible 
value  of  any  signaling  scheme  over  the  AWGN  channel,  while  the  bandwidth  efficiency  tends  to 
zero.  The  signal  space  is  M-dimensional  in  this  case,  but  we  can  actually  get  expressions  for 
the  probability  of  error  that  involve  a  single  integral  rather  than  M-dimensional  integrals,  by 
exploiting  the  orthogonality  of  the  signal  constellation. 


Pe  =  Q 


330 


Let  us  first  quickly  derive  the  union  bound.  Without  loss  of  generality,  take  the  M  orthogonal 
signals  as  unit  vectors  along  the  M  axes  in  our  signal  space.  With  this  scaling,  we  have  j  |sj|  |2  =  1, 
so  that  Es  —  1  and  Eb  =  ^  u .  Since  the  signals  are  orthogonal,  the  squared  distance  between 
any  two  signals  is 

dij  =  1 1  si  —  sj  1 1  =  1 1  si  1 1  "L  1 1  Sj  1 1  —  2 (sj ,  Sj )  =  2ES  =  2  ,  i  ^  j 
Thus,  dmin  =  dij  ( i  j -  j)  and  the  power  efficiency 

Vp  =  %  =  2  log2  M 


The  union  bound,  intelligent  union  bound  and  nearest  neighbors  approximation  all  coincide,  and 
we  get 


Pe  =  Pe\i  <  y 

0  & 


(M-l)Q 


1  rp 
mm 

Eh 


2N0 


We  now  get  the  following  expression  in  terms  of  Eb/N0. 

Union  bound  on  error  probability  for  M- ary  orthogonal  signaling 

Pe  <  (M  -  l)Q  (6.71) 

Exact  expressions:  By  symmetry,  the  error  probability  equals  the  conditional  error  probability, 
conditioned  on  any  one  of  the  hypotheses;  similarly,  the  probability  of  correct  decision  equals 
the  probability  of  correct  decision  given  any  of  the  hypothesis.  Let  us  therefore  condition  on 
hypothesis  H0  (i.e.,  that  s0  is  sent),  so  that  the  received  signal  y  =  So  +  n-  The  decision  statistics 


Zi  =  ( s0  +  n,  Si)  =  Es50i  +  Ni  ,  i  =  0, 1, ...,  M  -  1 
where  [Nt  =  (n,  s,)}  are  jointly  Gaussian,  zero  mean,  with 

cov (Ni,  Nj)  =  a2(si,Sj)  =  a2Es8ij 

Thus,  Ni  ~  N( 0,  a2Es )  are  i.i.d.  We  therefore  infer  that,  conditioned  on  s0  sent,  the  {Zi}  are 
conditionally  independent,  with  Z0  N(ES,  a2Es),  and  Z,  ~  N( 0,  a 2ES)  for  i  —  1, ...,  M  —  1. 

Let  us  now  express  the  decision  statistics  in  scale-invariant  terms,  by  replacing  Zi  by  This 

gives  Z0  ~  N(m,  1),  Z\ , ...,  ZM_i  ~  iV(0, 1),  conditionally  independent,  where 


Es 

cry fEs 


y/2Es/No  —  \J 2 Eb  log2  M/ Nq 


The  conditional  probability  of  correct  reception  is  now  given  by 


Pc  |o  =  P[Z\  <  Z0,...,ZM- 1  <  Z0\H0]  =  f  P[Zi  <  X,...,ZM- 1  <  x\Zq  =  X,  H0}pzo\Ho(x\H0)dx 
=  f  P[Zi  <  x\H0\...P[ZM-i  <  x | H0\pz0 1 h0 (x\Hq) dx 


where  we  have  used  the  conditional  independence  of  the  {Zi}.  Plugging  in  the  conditional 
distributions,  we  get  the  following  expression  for  the  probability  of  correct  reception. 


Probability  of  correct  reception  for  M- ary  orthogonal  signaling 

Pc  =  Pci,  =  .OM]"-1  *  (6.72) 

where  m  =  \J2 Es/N0  =  \j2Eh  log2  M/N0. 
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The  probability  of  error  is,  of  course,  one  minus  the  preceding  expression.  But  for  small  error 
probabilities,  the  probability  of  correct  reception  is  close  to  one,  and  it  is  difficult  to  get  good 
estimates  of  the  error  probability  using  (6.72).  We  therefore  develop  an  expression  for  the  error 
probability  that  can  be  directly  computed,  as  follows: 

Pe |o  =  ^  P\-Z3  =  maxiZi\Ho]  =  (M  -  l)P[Z1  =  max^lifo] 

3+  0 

where  we  have  used  symmetry.  Now, 

P[Zi  =  maXjZj|i7o]  =  P[Zq  <  Z2  <  Z\, ...,  Zm-  1  <  Z\\Hq\ 

=  f  P[Z0  <x,Z2<  x, ...,  ZM- 1  <  x\Zi  =  x,  H0]pZl\Ho{x\H0)dx 
=  f  P[Z0  <  x\H0]P[Z2  <  x\H0]...P[ZM-i  <  x\H0]pZl\H0(x\H0)dx 

Plugging  in  the  conditional  distributions,  and  multiplying  by  M— 1,  gives  the  following  expression 
for  the  error  probability. 

Probability  of  error  for  M- ary  orthogonal  signaling 

Pe  =  Pe\i  =  (M  -  1)  S™J$>{x)}M~2  <h(x  -  m)^=e-x2P  dx  (6.73) 

where  m  =  ^  =  sJ2Eb^M  ■ 


Figure  6.27:  Symbol  error  probabilities  for  M- ary  orthogonal  signaling. 


Asymptotics  for  large  M :  The  error  probability  for  M- ary  orthogonal  signaling  exhibits  an 
interesting  thresholding  effect  as  M  gets  large: 


lim  Pe 

M—>  00 


No  > 

f  <ln2 


(6.74) 


That  is,  by  letting  M  get  large,  we  can  get  arbitrarily  reliable  performance  as  long  as  E^/Nq 
exceeds  -1.6  dB  (In  2  expressed  in  dB).  This  result  is  derived  in  one  of  the  problems.  Actually,  we 
can  show  using  the  tools  of  information  theory  that  this  is  the  best  we  can  do  over  the  AWGN 
channel  in  the  limit  of  bandwidth  efficiency  tending  to  zero.  That  is,  M- ary  orthogonal  signaling 
is  asymptotically  optimum  in  terms  of  power  efficiency. 


Figure  6.27  shows  the  probability  of  symbol  error  as  a  function  of  E^/Nq  for  several  values  of  M. 
We  see  that  the  performance  is  quite  far  away  from  the  asymptotic  limit  of  -1.6  dB  (also  marked 
on  the  plot)  for  the  moderate  values  of  M  considered.  For  example,  the  E^/Nq  required  for 
achieving  an  error  probability  of  10~6  for  M  =  16  is  more  than  9  dB  away  from  the  asymptotic 

limit. 
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6.4  Bit  Error  Probability 


We  now  know  how  to  design  rules  for  deciding  which  of  M  signals  (or  symbols)  has  been  sent, 
and  how  to  estimate  the  performance  of  these  decision  rules.  Sending  one  of  M  signals  conveys 
m  =  log2  M  bits,  so  that  a  hard  decision  on  one  of  these  signals  actually  corresponds  to  hard 
decisions  on  m  bits.  In  this  section,  we  discuss  how  to  estimate  the  bit  error  probability,  or  the 
bit  error  rate  (BER),  as  it  is  often  called. 


Ns 
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Figure  6.28:  QPSK  with  Gray  coding. 


QPSK  with  Gray  coding:  We  begin  with  the  example  of  QPSK,  with  the  bit  mapping  shown 
in  Figure  6.28.  This  bit  mapping  is  an  example  of  a  Gray  code,  in  which  the  bits  corresponding 
to  neighboring  symbols  differ  by  exactly  one  bit  (since  symbol  errors  are  most  likely  going  to 
occur  by  decoding  into  neighboring  decision  regions,  this  reduces  the  number  of  bit  errors).  Let 
us  denote  the  symbol  labels  as  6[1]6[2]  for  the  transmitted  symbol,  where  6[1]  and  b[ 2]  each  take 
values  0  and  1.  Letting  6[1] 6[2]  denote  the  label  for  the  ML  symbol  decision,  the  probabilities  of 
bit  error  are  given  by  pi  =  P[6[l]  ^  6[1]]  and  p2  =  P[6[2]  ^  b[2}}.  The  average  probability  of  bit 
error,  which  we  wish  to  estimate,  is  given  by  pb  =  \{pi  +  p2).  Conditioned  on  00  being  sent,  the 
probability  of  making  an  error  on  b[  1]  is  as  follows: 

P[b[l]  =  1|00  sent]  =  P[ML  decision  is  10  or  11 100  sent]  =  P[NC  <  — —]  =  Q{—)  =  Q 

2  2cr 

where,  as  before,  we  have  expressed  the  result  in  terms  of  E^/Nq  using  the  power  efficiency 
jr  =  4.  We  also  note,  by  the  symmetry  of  the  constellation  and  the  bit  map,  that  the  conditional 
probability  of  error  of  6[1]  is  the  same,  regardless  of  which  symbol  we  condition  on.  Moreover, 
exactly  the  same  analysis  holds  for  b[ 2],  except  that  errors  are  caused  by  the  noise  random 
variable  Ns.  We  therefore  obtain  that 

Pb  =  Pi  =  P2  =  Q  (6-75) 

The  fact  that  this  expression  is  identical  to  the  bit  error  probability  for  binary  antipodal  signaling 
is  not  a  coincidence.  QPSK  with  Gray  coding  can  be  thought  of  as  two  independent  BPSK 
systems,  one  signaling  along  the  I  component,  and  the  other  along  the  Q  component. 

Gray  coding  is  particularly  useful  at  low  SNR  (e.g.,  for  heavily  coded  systems),  where  symbol 
errors  happen  more  often.  For  example,  in  a  coded  system,  we  would  pass  up  fewer  bit  errors  to 
the  decoder  for  the  same  number  of  symbol  errors.  We  define  it  in  general  as  follows. 
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Gray  Coding:  Consider  a  2 "-ary  constellation  in  which  each  point  is  represented  by  a  binary 
string  b  =  (bi,...,bn).  The  bit  assigment  is  said  to  be  Gray  coded  if,  for  any  two  constellation 
points  b  and  b'  which  are  nearest  neighbors,  the  bit  representations  b  and  b'  differ  in  exactly 
one  bit  location. 

Nearest  neighbors  approximation  for  BER  with  Gray  coded  constellation:  Consider 
the  ?'th  bit  bi  in  an  n-bit  Gray  code  for  a  regular  constellation  with  minimum  distance  dmin.  For 
a  Gray  code,  there  is  at  most  one  nearest  neighbor  which  differs  in  the  ith  bit,  and  the  pairwise 
error  probability  of  decoding  to  that  neighbor  is  Q  We  therefore  have 


P(bit  error)  «  Q 


with  Gray  coding 


(6.76) 


where  rjp 


' min 

Eb 


is  the  power  efficiency. 


Figure  6.29:  BER  for  16QAM  and  16PSK  with  Gray  coding. 


Figure  6.29  shows  the  BER  of  16QAM  and  16PSK  with  Gray  coding,  comparing  the  nearest 
neighbors  approximation  with  exact  results  (obtained  analytically  for  16QAM,  and  by  simulation 
for  16PSK).  The  slight  pessimism  and  ease  of  computation  of  the  nearest  neighbors  approximation 
implies  that  it  is  an  excellent  tool  for  link  design. 

Gray  coding  may  not  always  be  possible.  Indeed,  for  an  arbitrary  set  of  M  =  2n  signals,  we  may 
not  understand  the  geometry  well  enough  to  assign  a  Gray  code.  In  general,  a  necessary  (but 
not  sufficient)  condition  for  an  n-bit  Gray  code  to  exist  is  that  the  number  of  nearest  neighbors 
for  any  signal  point  should  be  at  most  n. 

BER  for  orthogonal  modulation:  For  M  =  2m-ary  equal  energy,  orthogonal  modulation, 
each  of  the  m  bits  split  the  signal  set  into  half.  By  the  symmetric  geometry  of  the  signal  set, 
any  of  the  M  —  1  wrong  symbols  are  equally  likely  to  be  chosen,  given  a  symbol  error,  and  of 
these  will  correspond  to  error  in  a  given  bit.  We  therefore  have 

M 

P(bit  error)  =  — -  P(symbol  error),  BER  for  M  —  ary  orthogonal  signaling  (6.77) 

Note  that  Gray  coding  is  out  of  the  question  here,  since  there  are  only  m  bits  and  2m  —  1 
neighbors,  all  at  the  same  distance. 
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6.5  Link  Budget  Analysis 


We  have  seen  now  that  performance  over  the  AWGN  channel  depends  only  on  constellation  ge¬ 
ometry  and  Eb/N0.  In  order  to  design  a  communication  link,  however,  we  must  relate  Eb/N0  to 
physical  parameters  such  as  transmit  power,  transmit  and  receive  antenna  gains,  range  and  the 
quality  of  the  receiver  circuitry.  Let  us  first  take  stock  of  what  we  know: 

(a)  Given  the  bit  rate  Rb  and  the  signal  constellation,  we  know  the  symbol  rate  (or  more  gen¬ 
erally,  the  number  of  modulation  degrees  of  freedom  required  per  unit  time),  and  hence  the 
minimum  Nyquist  bandwidth  Bmin.  We  can  then  factor  in  the  excess  bandwidth  a  dictated 
by  implementation  considerations  to  find  the  bandwidth  B  —  (1  +  a)Bmin  required.  (However, 
assuming  optimal  receiver  processing,  we  show  below  that  the  excess  bandwidth  does  not  affect 
the  link  budget.) 

(b)  Given  the  constellation  and  a  desired  bit  error  probability,  we  can  infer  the  Eb/N0  we  need 
to  operate  at.  Since  the  SNR  satisfies  SNR  =  we  have 

SNRrvi  =  (§■)  §  (6.78) 

reqd  B 

(c)  Given  the  receiver  noise  figure  F  (dB),  we  can  infer  the  noise  power  Pn  =  N0B  =  N0tnom10F  P°B, 
and  hence  the  minimum  required  received  signal  power  is  given  by 

P ax  (mm)  =  SNRreqdPn  =  ^  N0B  =  RbN0>nom  10F/1°  (6.79) 

\  0  /  reqd  \ 1  0  /  reqd 

This  is  called  the  required  receiver  sensitivity,  and  is  usually  quoted  in  dBm,  as  TW,dBm(min)  = 

10  log10  PRx(min)(mW).  Using  (5.93),  we  obtain  that 


dBm  (min)  =  (  J  +  10  log10  Rb  -174  +  F  (6.80) 

\  0  /  reqd,dB 

where  Rb  is  in  bits  per  second.  Note  that  dependence  on  bandwidth  B  (and  hence  on  excess 
bandwidth)  cancels  out  in  (6.79),  so  that  the  final  expression  for  receiver  sensitivity  depends 
only  on  the  required  Eb/No  (which  depends  on  the  signaling  scheme  and  target  BER),  the  bit 
rate  Rbl  and  the  noise  figure  F. 

Once  we  know  the  receiver  sensitivity,  we  need  to  determine  the  link  parameters  (e.g.,  transmitted 
power,  choice  of  antennas,  range)  such  that  the  receiver  actually  gets  at  least  that  much  power, 
plus  a  link  margin  (typically  expressed  in  dB).  We  illustrate  such  considerations  via  the  Friis 
formula  for  propagation  loss  in  free  space,  which  we  can  think  of  as  modeling  a  line-of-sight 
wireless  link.  While  deriving  this  formula  from  basic  electromagnetics  is  beyond  our  scope  here, 
let  us  provide  some  intuition  before  stating  it. 

Suppose  that  a  transmitter  emits  power  PRX  that  radiates  uniformly  in  all  directions.  The  power 
per  unit  area  at  a  distance  R  from  the  transmitter  is  where  we  have  divided  by  the  area 
of  a  sphere  of  radius  R.  The  receive  antenna  may  be  thought  of  as  providing  an  effective  area, 
termed  the  antenna  aperture,  for  catching  a  portion  of  this  power.  (The  aperture  of  an  antenna 
is  related  to  its  size,  but  the  relation  is  not  usually  straightforward.)  If  we  denote  the  receive 
antenna  aperture  by  ARX,  the  received  power  is  given  by 


Prx  — 


P 


TX 


47 tK2 


A 


RX 


Now,  if  the  transmitter  can  direct  power  selectively  in  the  direction  of  the  receiver  rather  than 
radiating  it  isotropically,  we  get 


Prx  — 


Ptx  r  A 
4^PGtxArX 


(6.81) 
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where  Gtx  is  the  transmit  antenna’s  gain  towards  the  receiver,  relative  to  a  hypothetical  isotropic 
radiator.  We  now  have  a  formula  for  received  power  in  terms  of  transmitted  power,  which  depends 
on  the  gain  of  the  transmit  antenna  and  the  aperture  of  the  receive  antenna.  We  would  like  to 
express  this  formula  solely  in  terms  of  antenna  gains  or  antenna  apertures.  To  do  this,  we  need 
to  relate  the  gain  of  an  antenna  to  its  aperture.  To  this  end,  we  state  without  proof  that  the 
aperture  of  an  isotropic  antenna  is  given  by  A  =  Since  the  gain  of  an  antenna  is  the  ratio 
of  its  aperture  to  that  of  an  isotropic  antenna.  This  implies  that  the  relation  between  gain  and 
aperture  can  be  written  as 

_  A  _  An  A 
G  =  A2/(4vr)  =  'A2" 

Assuming  that  the  aperture  A  scales  up  in  some  fashion  with  antenna  size,  this  implies  that,  for 
a  fixed  form  factor,  we  can  get  higher  antenna  gains  as  we  decrease  the  carrier  wavelength,  or 
increase  the  carrier  frequency. 

Using  (6.82)  in  (6.81),  we  get  two  versions  of  the  Friis  formula: 


Friis  formula  for  free  space  propagation 


Prx  —  Ptx  Gtx  Grx 
Prx  =  Ptx 


A2 


I67 t2R2 
AtxArx 


,  in  terms  of  antenna  gains 


A  2R2 


,  in  terms  of  antenna  apertures 


(6.83) 

(6.84) 


where 


•  Grx,  ATx  are  the  gain  and  aperture,  respectively,  of  the  transmit  antenna, 

•  Grx,  ArX  are  the  gain  and  aperture,  respectively,  of  the  receive  antenna, 

•  A  =  j-  is  the  carrier  wavelength  (c  =  3  x  108  meters/sec,  is  the  speed  of  light,  fc  the  carrier 
frequency) , 

•  R  is  the  range  (line-of-sight  distance  between  transmitter  and  receiver). 


The  first  version  (6.83)  of  the  Friis  formula  tells  us  that,  for  antennas  with  fixed  gain,  we  should 
try  to  use  as  low  a  carrier  frequency  (as  large  a  wavelength)  as  possible.  On  the  other  hand, 
the  second  version  tells  us  that,  if  we  have  antennas  of  a  given  form  factor,  then  we  can  get 
better  performance  as  we  increase  the  carrier  frequency  (decrease  the  wavelength),  assuming  of 
course  that  we  can  “point”  these  antennas  accurately  at  each  other.  Of  course,  higher  carrier 
frequencies  also  have  the  disadvantage  of  incurring  more  attenuation  from  impairments  such  as 
obstacles,  rain,  fog.  Some  of  these  tradeoffs  are  explored  in  the  problems. 


In  order  to  apply  the  Friis  formula  (let  us  focus  on  version  (6.83)  for  concreteness)  to  link  budget 
analysis,  it  is  often  convenient  to  take  logarithms,  converting  the  multiplications  into  addition. 
On  a  logarithmic  scale,  antenna  gains  are  expressed  in  dBi,  where  GdBi  =  101og10G  for  an 
antenna  with  raw  gain  G.  Expressing  powers  in  dBm,  we  have 


-Prx, dBm  =  Prx, dBm  +  Oj'x.dBi  +  Grx, dBi  +  10  log10 

More  generally,  we  have  the  link  budget  equation 

Prx,  dBm  =  Prx, dBm  +  Gtx, dBi  +  Grx, dBi  —  Lpathloss,dB(R ) 


(6.85) 

(6.86) 


where  Lpathioss,dB(R )  is  the  path  loss  in  dB.  For  free  space  propagation,  we  have  from  the  Friis 
formula  (6.85)  that 


L 


pathloss,dB 


(P) 


10  log10 


16n2R2 

A2 


path  loss  in  dB  for  free  space  propagation 


(6.87) 


While  the  Friis  formula  is  our  starting  point,  the  link  budget  equation  (6.86)  applies  more  gen¬ 
erally,  in  that  we  can  substitute  other  expressions  for  path  loss,  depending  on  the  propagation 
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environment.  For  example,  for  wireless  communication  in  a  cluttered  environment,  the  signal 
power  may  decay  as  rather  than  the  free  space  decay  of  A  mixture  of  empirical  mea¬ 
surements  and  statistical  modeling  is  typically  used  to  characterize  path  loss  as  a  function  of 
range  for  the  environments  of  interest.  For  example,  the  design  of  wireless  cellular  systems  is 
accompanied  by  extensive  “measurement  campaigns”  and  modeling.  Once  we  decide  on  the  path 
loss  formula  (LpathiossdB(R))  to  be  used  in  the  design,  the  transmit  power  required  to  attain  a 
given  receiver  sensitivity  can  be  determined  as  a  function  of  range  R.  Such  a  path  loss  formula 
typically  characterizes  an  “average”  operating  environment,  around  which  there  might  be  sig¬ 
nificant  statistical  variations  that  are  not  captured  by  the  model  used  to  arrive  at  the  receiver 
sensitivity  For  example,  the  receiver  sensitivity  for  a  wireless  link  may  be  calculated  based  on  the 
AWGN  channel  model,  whereas  the  link  may  exhibit  rapid  amplitude  variations  due  to  multipath 
fading,  and  slower  variations  due  to  shadowing  (e.g.,  due  to  buildings  and  other  obstacles).  Even 
if  fading/shadowing  effects  are  factored  into  the  channel  model  used  to  compute  BER,  and  the 
model  for  path  loss,  the  actual  environment  encountered  may  be  worse  than  that  assumed  in 
the  model.  In  general,  therefore,  we  add  a  link  margin  Lmargm, dB,  again  expressed  in  dB,  in  an 
attempt  to  budget  for  potential  performance  losses  due  to  unmodeled  or  unforeseen  impairments. 
The  size  of  the  link  margin  depends,  of  course,  on  the  confidence  of  the  system  designer  in  the 
models  used  to  arrive  at  the  rest  of  the  link  budget. 

Putting  all  this  together,  if  Prx, dBm  (min)  is  the  desired  receiver  sensitivity  (i.e. ,  the  minimum 
required  received  power),  then  we  compute  the  transmit  power  for  the  link  to  be 

Required  transmit  power 

pTA',dBm  =  -PrX, dBm  (mill)  —  GtX, dBi  ~  G/jx,dBi  +  Lpat/iioss, dB  (P)  +  ^margin, dB  (6.88) 

Let  us  illustrate  these  concepts  using  some  examples. 


Example  6.5.1  Consider  again  the  5  GHz  WLAN  link  of  Example  5.8.1.  We  wish  to  utilize  a 
20  MHz  channel,  using  Gray  coded  QPSK  and  an  excess  bandwidth  of  33  %.  The  receiver  has 
a  noise  figure  of  6  dB. 

(a)  What  is  the  bit  rate? 

(b)  What  is  the  receiver  sensitivity  required  to  achieve  a  BER  of  10~6? 

(c)  Assuming  transmit  and  receive  antenna  gains  of  2  dBi  each,  what  is  the  range  achieved  for 
100  mW  transmit  power,  using  a  link  margin  of  20  dB?  Use  link  budget  analysis  based  on  free 
space  path  loss. 

Solution  (a)  For  bandwidth  B  and  fractional  excess  bandwidth  a,  the  symbol  rate 


Rs  =  -  =  — 

T  1  +  a 


20 


=  15  Msymbols/sec 


1  +  0.33 

and  the  bit  rate  for  an  M- ary  constellation  is 

Rb  =  Rs  log2  M  =  15  Msymbols/sec  x  2  bits/symbol  =  30  Mbits/sec 


(b)  BER  for  QPSK  with  Gray  coding  is  Q  .  For  a  desired  BER  of  10  ,  we  obtain  that 

10.2.  Plugging  in  Rb  =  30  Mbps  and  F  =  6  dB  in  (6.80),  we  obtain  that  the 


Eb  \ 
No  ) 


reqd,db 


required  receiver  sensitivity  is  Prx, dBm  (min)  =  —83  dBm. 

(c)  The  transmit  power  is  100  mW,  or  20  dBm.  Rewriting  (6. 
the  desired  sensitivity  at  the  desired  link  margin  is 


,  the  allowed  path  loss  to  attain 


Jpathloss,  dB 


(P)  —  Prx, dBm  —  -P?x, dBm  (min)  +  Gtx, dBi  +  G/jx,dBi  —  L 


mar  gin, dB 


=  20  -  (-83)  +  2  +  2  -  20  =  87  dB 


(6.89) 
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We  can  now  invert  the  formula  for  free  space  loss,  (6.87),  noting  that  fc  —  5  GHz,  which  implies 
A  =  4-  =  0.06  m.  We  get  a  range  R  of  107  meters,  which  is  of  the  order  of  the  advertised  ranges 
for  WLANs  under  nominal  operating  conditions.  The  range  decreases,  of  course,  for  higher  bit 
rates  using  larger  constellations.  What  happens,  for  example,  when  we  use  16QAM  or  64QAM? 


Example  6.5.2  Consider  an  indoor  link  at  10  meters  range  using  unlicensed  spectrum  at  60 
GHz.  Suppose  that  the  transmitter  and  receiver  each  use  antennas  with  horizontal  beamwidths 
of  60o  and  vertical  beamwidths  of  30°.  Use  the  following  approximation  to  calculate  the  resulting 
antenna  gains: 

41000 


where  G  denotes  the  antenna  gain  (linear  scale),  dhoriz  and  9vert  denote  horizontal  and  vertical 
beamwidths  (in  degrees).  Set  the  noise  figure  to  8  dB,  and  assume  a  link  margin  of  10  dB  at 
BER  of  1(T6. 

(a)  Calculate  the  bandwidth  and  transmit  power  required  for  a  2  Gbps  link  using  Gray  coded 
QPSK  and  50%  excess  bandwidth. 

(b)  How  do  your  answers  change  if  you  change  the  signaling  scheme  to  Gray  coded  16QAM, 
keeping  the  same  bit  rate  as  in  (a)? 

(c)  If  you  now  employ  Gray  coded  16QAM  keeping  the  same  symbol  rate  as  in  (a),  what  is  the 
bit  rate  attained  and  the  transmit  power  required? 

(d)  How  do  the  answers  in  the  setting  of  (a)  change  if  you  increase  the  horizontal  beamwidth  to 
120°,  keeping  all  other  parameters  fixed? 

Solution:  (a)  A  2  Gbps  link  using  QPSK  corresponds  to  a  symbol  rate  of  1  Gsymbols/sec. 
Factoring  in  the  50%  excess  bandwidth,  the  required  bandwidth  is  B  —  1.5  GHz.  The  target 
BER  and  constellation  are  as  in  the  previous  example,  hence  we  still  have  ( Eb/No)reqd}dB  ~  10.2 
dB.  Plugging  in  Rb  =  2  Gbps  and  F  =  8  dB  in  (6.80),  we  obtain  that  the  required  receiver 
sensitivity  is  Prx, dBm  (min)  =  —62.8  dBm. 

The  antenna  gains  at  each  end  are  given  by 


41000 

G  ^ - =  22.78 

60  x  30 

Converting  to  dB  scale,  we  obtain  GTx,dBi  =  Grx4Bi  =  13.58  dBi. 

The  transmit  power  for  a  range  of  10  m  can  now  be  obtained  using  (6.88)  to  be  8.1  dBm. 

(b)  For  the  same  bit  rate  of  2  Gbps,  the  symbol  rate  for  16QAM  is  0.5  Gsymbols/sec,  so  that 
the  bandwidth  required  is  0.75  GHz,  factoring  in  50%  excess  bandwidth.  The  nearest  neighbors 


approximation  to  BER  for  Gray  coded  16QAM  is  given  by  Q  .  Using  this,  we  find  that 

a  target  BER  of  10-6  requires  ( Eb/N0)reqd:dB  ~  14.54  dB,  and  increase  of  4.34  dB  relative  to  (a). 
This  leads  to  a  corresponding  increase  in  the  receiver  sensitivity  to  -58.45  dBm,  which  leads  to 
the  required  transmit  power  increasing  to  12.4  dBm. 

(c)  If  we  keep  the  symbol  rate  fixed  at  1  Gsymbols/sec,  the  bit  rate  with  16QAM  is  Rb  =  4  Gbps. 
As  in  (b),  ( Eb/N0)reqcitCiB  ~  14.54  dB.  The  receiver  sensitivity  is  therefore  given  by  -55.45  dBm, 
a  3  dB  increase  over  (b),  corresponding  to  the  doubling  of  the  bit  rate.  This  translates  directly 
to  a  3  dB  increase,  relative  to  (b),  in  transmit  power  to  15.4  dBm,  since  the  path  loss,  antenna 
gains,  and  link  margin  are  as  in  (b). 

(d)  We  now  go  back  to  the  setting  of  (a),  but  with  different  antenna  gains.  The  bandwidth  is, 
of  course,  unchanged  from  (a).  The  new  antenna  gains  are  3  dB  smaller  because  of  the  doubling 
of  horizontal  beamwidth.  The  receiver  sensitivity,  path  loss  and  link  margin  are  as  in  (a),  thus 
the  3  dB  reduction  in  antenna  gains  at  each  end  must  be  compensated  for  by  a  6  dB  increase  in 
transmit  power  relative  to  (a).  Thus,  the  required  transmit  power  is  14.1  dBm. 


Discussion:  The  parameter  choices  in  the  preceding  examples  illustrate  how  physical  character¬ 
istics  of  the  medium  change  with  choice  of  carrier  frequency,  and  affect  system  design  tradeoffs. 
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The  5  GHz  system  in  Example  6.5.1  employs  essentially  omnidirectional  antennas  with  small 
gains  of  2  dBi,  whereas  it  is  possible  to  realize  highly  directional  yet  small  antennas  (e.g.,  using 
electronically  steerable  printed  circuit  antenna  arrays)  for  the  60  GHz  system  in  Example  6.5.2 
by  virtue  of  the  small  (5  mm)  wavelength.  60  GHz  waves  are  easily  blocked  by  walls,  hence  the 
range  in  Example  6.5.2  corresponds  to  in-room  communication.  We  have  also  chosen  parameters 
such  that  the  transmit  power  required  for  60  GHz  is  smaller  than  that  at  5  GHz,  since  it  is 
more  difficult  to  produce  power  at  higher  radio  frequencies.  Finally,  the  link  margin  for  5  GHz 
is  chosen  higher  than  for  60  GHz:  propagation  at  60  GHz  is  near  line-of-sight,  whereas  fading 
due  to  multipath  propagation  at  5  GHz  can  be  more  significant,  and  hence  may  require  a  higher 
link  margin  relative  to  the  AWGN  benchmark  which  provides  the  basis  for  our  link  budget. 


6.6  Concept  Summary 


This  chapter  establishes  a  systematic  hypothesis  testing  based  framework  for  demodulation,  de¬ 
velops  tools  for  performance  evaluation  which  enable  exploration  of  the  power-bandwidth  trade¬ 
offs  exhibited  different  signaling  schemes,  and  relates  these  mathematical  models  to  physical  link 
parameters  via  the  link  budget.  A  summary  of  some  key  concepts  and  results  is  as  follows. 

Hypothesis  testing 

•  The  probability  of  error  is  minimized  by  choosing  the  hypothesis  with  the  maximum  a  posteriori 
probability  (i.e. ,  the  hypothesis  that  is  most  likely  conditioned  on  the  observation).  That  is,  the 
MPE  rule  is  also  the  MAP  rule: 


5mpe(v)  =  $MAp(y)  =  arg  max^^  P[Ht\Y  =  y } 

=  arg  max1<i<M  n,p(y\i)  =  arg  max^^  log  m  +  log  p(y\i) 

For  equal  priors,  the  MPE  rule  coincides  with  the  ML  rule: 


$ML(y)  =  arg  max1<i<M  p(y\i)  =  arg  max^^  logp(y\i) 


•  For  binary  hypothesis  testing,  ML  and  MPE  rules  can  be  written  as  likelihood,  or  log  likelihood, 
ratio  tests: 


L(y)  = 


m 


pi(y) 

po(y) 


H  i 

H 

pi(y) 

po(y) 

>  1 

< 

or  log  L(y) 

> 

< 

Ho 

Hi 

H\ 

>  7T0 

—  or 

log  L(y)  > 

vr  o 

<  7Ti 

< 

7Tl 

H0 

Ho 

ML  rule 


MPE/MAP  rule 


Geometric  view  of  signals 

Continuous-time  signals  can  be  interpreted  as  vectors  in  Euclidean  space,  with  inner  product 
(si,s2)  =  /  <s i (t)^2 (t)  dt ,  norm  ||s||  =  a / (s,  s),  and  energy  ||s||2  =  (s,  s).  Two  signals  are 
orthogonal  if  their  inner  product  is  zero. 

Geometric  view  of  WGN 

•  WGN  n(t)  with  PSD  a2,  when  projected  in  any  “direction”  (i.e.,  correlated  against  any  unit 
energy  signal),  yields  an  A(0,cr2)  random  variable.  •  More  generally,  projections  of  the  noise 
along  any  signals  are  jointly  Gaussian,  with  zero  mean  and  cov  (( n,u ),  (n,v))  =  a2(v,u). 

•  Noise  projections  along  orthogonal  signals  are  uncorrelated.  Since  they  are  jointly  Gaussian, 
they  are  also  independent. 
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Signal  space 

•  M-ary  signaling  in  AWGN  in  continuous  time  can  be  reduced,  without  loss  of  information, 
to  M-ary  signaling  in  finite-dimensional  vector  space  with  each  dimension  seeing  i.i.d.  TV^a2) 
noise,  which  corresponds  to  discrete  time  WGN.  This  is  accomplished  by  projecting  the  received 
signal  onto  the  signal  space  spanned  by  the  M  possible  signals. 

•  Decision  rules  derived  using  hypothesis  testing  in  the  finite-dimensional  signal  space  map 
directly  back  to  continuous  time  because  of  two  key  reasons:  signal  inner  products  are  preserved, 
and  the  noise  component  orthogonal  to  the  signal  space  is  irrelevant.  Because  of  this  equivalence, 
we  can  stop  making  a  distinction  between  continuous  time  signals  and  finite-dimensional  vector 
signals  in  our  notation. 

Optimal  demodulation 

•  For  the  model  Hi  —  y  =  s*  +  n,  0  <  %  <  M  —  1,  optimum  demodulation  involve  computation  of 
the  correlator  outputs  Zt  =  (y,Si).  This  can  be  accomplished  by  using  a  bank  of  correlators  or 
matched  filters,  but  any  other  other  receiver  structure  that  yields  the  statistics  { Z* }  would  also 
preserve  all  of  the  relevant  information. 

•  The  ML  and  MPE  rules  are  given  by 


Sml(v)  =  arg  maxo^jtf.!  {y,Si) - 

Smpe(v)  =  arg  max^^^  (y,  s*> - +  a2  logvr; 


When  the  received  signal  lies  in  a  finite-dimensional  space  in  which  the  noise  has  finite  energy, 
the  ML  rule  can  be  written  as  a  minimum  distance  rule  (and  the  MPE  rule  as  a  variant  thereof) 
as  follows: 


SmM arg  min0<i<M_1  || y  -  s*||2 

$mpe( y)  =  arg  min0<i<M_1  || y  -  S;||2  -  2cr2log7r; 


Geometry  of  ML  rule:  ML  decision  boundaries  are  formed  from  hyperplanes  that  bisect  lines 
connecting  signal  points. 

Performance  analysis 

•  For  binary  signaling,  the  error  probability  for  the  ML  rule  is  given  by 


Pe 


where  d  —  ||si  —  so||  is  the  Euclidean  distance  between  the  signals.  The  performance  therefore 
depends  on  the  power  efficiency  rjp  =  4p  and  the  SNR  Eb/N0.  Since  the  power  efficiency  is  scale- 
invariant,  we  may  choose  any  convenient  scaling  when  computing  it  for  a  given  constellation. 

•  For  M-ary  signaling,  closed  form  expressions  for  the  error  probability  may  not  be  available, 
but  we  know  that  the  performance  depends  only  on  the  scale-invariant  inner  products  {  h 
which  depend  on  the  constellation  “shape”  alone,  and  on  Eb/N0. 

•  The  conditional  error  probabilities  for  M-ary  signaling  can  be  bounded  using  the  union  bound 
(these  can  then  be  averaged  to  obtain  an  upper  bound  on  the  average  error  probability): 


Pe\i<Y,Q 


where  dtj  =  ||s,  —  Sj \  \  are  the  pairwise  distances  between  signal  points. 

•  When  we  understand  the  shape  of  the  decision  regions,  we  can  tighten  the  union  bound  into 
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an  intelligent  union  bound: 


where  Nmi(i)  denotes  the  set  of  neighbors  of  s*  which  define  the  decision  region  T,;. 
•  For  regular  constellations,  the  nearest  neighbors  approximation  is  given  by 


Nd  .  Q 

U' min  * 


(p 

with  r]p  =  providing  a  measure  of  power  efficiency  which  can  be  used  to  compare  across 
constellations. 

•  If  Gray  coding  is  possible,  the  bit  error  probability  can  be  estimated  as 


P(bit  error)  «  Q 


r]pEb 

2N0 


Link  budget:  This  relates  (e.g.,  using  the  Friis  formula  for  free  space  propagation)  the  per¬ 
formance  of  a  communication  link  to  physical  parameters  such  as  transmit  power,  transmit  and 
receive  antenna  gains,  range,  and  receiver  noise  figure.  A  link  margin  is  typically  introduced  to 
account  for  unmodeled  impairments. 


6.7  Endnotes 


The  geometric  signal  space  approach  for  deriving  and  analyzing  is  now  standard  in  textbooks 
on  communication  theory,  such  as  [7,  8].  It  was  first  developed  by  Russian  pioneer  Vladimir 
Kotelnikov  [33],  and  presented  in  a  cohesive  fashion  in  the  classic  textbook  by  Wozencraft  and 
Jacobs  [9]. 

A  number  of  details  of  receiver  design  have  been  swept  under  the  rug  in  this  chapter.  Our 
model  for  the  received  signal  is  that  it  equals  the  transmitted  signal  plus  WGN.  In  practice, 
the  transmitted  signal  can  be  significantly  distorted  by  the  channel  (e.g.,  scaling,  delay,  multi- 
path  propagation).  However,  the  basic  M-ary  signaling  model  is  still  preserved:  if  M  possible 
signals  are  sent,  then,  prior  to  the  addition  of  noise,  M  possible  signals  are  received  after  the 
deterministic  (but  a  priori  unknown)  transformations  due  to  channel  impairments.  The  receiver 
can  therefore  estimate  noiseless  copies  of  the  latter  and  then  apply  the  optimum  demodula¬ 
tion  techniques  developed  here.  This  approach  leads,  for  example,  to  the  optimal  equalization 
strategies  developed  by  Forney  [34]  and  Ungerboeck  [35];  see  Chapter  5  of  [7]  for  a  textbook 
exposition.  Estimation  of  the  noiseless  received  signals  involves  tasks  such  as  carrier  phase  and 
frequency  synchronization,  timing  synchronization,  and  estimation  of  the  channel  impulse  re¬ 
sponse  or  transfer  function.  In  modern  digital  communication  transceivers,  these  operations 
are  typically  all  performed  using  DSP  on  the  complex  baseband  received  signal.  Perhaps  the 
best  approach  for  exploring  further  is  to  acquire  a  basic  understanding  of  the  relevant  estima¬ 
tion  techniques,  and  to  then  go  to  technical  papers  of  specific  interest  (e.g.,  IEEE  conference 
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and  journal  publications).  Classic  texts  covering  estimation  theory  include  Kay  [36],  Poor  [37] 
and  Van  Trees  [38].  Several  graduate  texts  in  communications  contain  a  brief  discussion  of  the 
modern  estimation-theoretic  approach  to  synchronization  that  may  provide  a  helpful  orientation 
prior  to  going  to  the  research  literature;  for  example,  see  [7]  (Chapter  4)  and  [11,  39]  (Chapter 
8). 


6.8  Problems 


Hypothesis  Testing 


Problem  6.1  The  received  signal  in  a  digital  communication  system  is  given  by 

f  s(t)  +  n(t)  1  sent 
V[t)  =  1  n(t)  0  sent 


where  n  is  AWGN  with  PSD  a2  =  N0/2  and  s(t)  is  as  shown  below.  The  received  signal  is  passed 


Figure  6.30:  Set-up  for  Problem  6.1 


through  a  filter,  and  the  output  is  sampled  to  yield  a  decision  statistic.  An  ML  decision  rule  is 
employed  based  on  the  decision  statistic.  The  set-up  is  shown  in  Figure  6.30. 

(a)  For  h{t)  =  s(—t),  find  the  error  probability  as  a  function  of  Eb/N0  if  t0  =  1. 

(b)  Can  the  error  probability  in  (a)  be  improved  by  choosing  the  sampling  time  t o  differently? 

(c)  Now,  find  the  error  probability  as  a  function  of  Eb/N0  for  h(t)  =  J[0, 2]  and  the  best  possible 
choice  of  sampling  time. 

(d)  Finally,  comment  on  whether  you  can  improve  the  performance  in  (c)  by  using  a  linear  com¬ 
bination  of  two  samples  as  a  decision  statistic,  rather  than  just  using  one  sample. 


Problem  6.2  Consider  binary  hypothesis  testing  based  on  the  decision  statistic  Y,  where  Y  ~ 
iV (2,9)  under  Hi  and  Y  ~  N(— 2,4)  under  H0. 

(a)  Show  that  the  optimal  (ML  or  MPE)  decision  rule  is  equivalent  to  comparing  a  function  of 
the  form  ay 2  +  by  to  a  threshold. 

(b)  Specify  the  MPE  rule  explicitly  (i.e.,  specify  a,  b  and  the  threshold)  when  7 r0  = 

(c)  Express  the  conditional  error  probability  Pe |o  for  the  decision  rule  in  (b)  in  terms  of  the  Q 
function  with  positive  arguments.  Also  provide  a  numerical  value  for  this  probability. 

Problem  6.3  Find  and  sketch  the  decision  regions  for  a  binary  hypothesis  testing  problem  with 
observation  Z,  where  the  hypotheses  are  equally  likely,  and  the  conditional  distributions  are 
given  by 

Hq\  Z  is  uniform  over  [—2,  2] 

H\ :  Z  is  Gaussian  with  mean  0  and  variance  1. 
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Problem  6.4  The  receiver  in  a  binary  communication  system  employs  a  decision  statistic  Z 
which  behaves  as  follows: 

Z  =  N  if  0  is  sent 
Z  =  4  +  N  if  1  is  sent 

where  N  is  modeled  as  Laplacian  with  density 


pN(x) 


—  OO  <  X  <  oo 


Note:  Parts  (a)  and  (b)  can  be  done  independently. 

(a)  Find  and  sketch,  as  a  function  of  z,  the  log  likelihood  ratio 


K(z)  =  log  L(z)  =  log44^r 

p{z\0) 


where  p(z\i)  denotes  the  conditional  density  of  Z  given  that  i  is  sent  {i  =  0, 1). 

(b)  Find  Pe n,  the  conditional  error  probability  given  that  1  is  sent,  for  the  decision  rule 


5(» 


0,  z  <1 
1,  z  >  1 


(c)  Is  the  rule  in  (b)  the  MPE  rule  for  any  choice  of  prior  probabilities?  If  so,  specify  the  prior 
probability  7T0  =  P[  0  sent]  for  which  it  is  the  MPE  rule.  If  not,  say  why  not. 


Problem  6.5  Consider  the  MAP/MPE  rule  for  the  hypothesis  testing  problem  in  Example 

6.1.1. 

(a)  Show  that  the  MAP  rule  always  says  Hi  if  the  prior  probability  of  Ho  is  smaller  than  some 
positive  threshold.  Specify  this  threshold. 

(b)  Compute  and  plot  the  conditional  probabilities  Pe |0  and  Pe h,  and  the  average  error  proba¬ 
bility  Pe,  versus  7r0  as  the  latter  varies  in  [0, 1]. 

(c)  Discuss  any  trends  that  you  see  from  the  plots  in  (b). 


Problem  6.6  Consider  a  MAP  receiver  for  the  basic  Gaussian  example,  as  discussed  in  Example 
6.1.2.  Fix  SNR  at  13  dB.  We  wish  to  explore  the  effect  of  prior  mismatch,  by  quantifying  the 
performance  degradation  of  a  MAP  receiver  if  the  actual  priors  are  different  from  the  priors  for 
which  it  has  been  designed. 

(a)  Plot  the  average  error  probability  for  a  MAP  receiver  designed  for  i r0  =  0.2,  as  i r0  varies 
from  0  to  1.  As  usual,  use  a  log  scale  for  the  probabilities.  On  the  same  plot,  also  plot  the  error 
probability  of  the  ML  receiver  as  a  benchmark. 

(b)  From  the  plot  in  (a),  comment  on  how  much  error  you  can  tolerate  in  the  prior  probabilities 
before  the  performance  of  the  MAP  receiver  designed  for  the  given  prior  becomes  unacceptable. 

(c)  Repeat  (a)  and  (b)  for  a  MAP  receiver  designed  for  hq  =  0.4.  Is  the  performance  more  or 
less  sensitive  to  errors  in  the  priors? 


Problem  6.7  Consider  binary  hypothesis  testing  in  which  the  observation  Y  is  modeled  as  uni¬ 
formly  distributed  over  [—2,  2]  under  H0,  and  has  conditional  density  p(y\l)  =  c(l  —  \y\/3)I[-3^](y) 
under  Hi,  where  c  >  0  is  a  constant  to  be  determined. 

(a)  Find  c. 

(b)  Find  and  sketch  the  decision  regions  r0  and  r  j  corresponding  to  the  ML  decision  rule. 

(c)  Find  the  conditional  error  probabilities. 
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Problem  6.8  Consider  binary  hypothesis  testing  with  scalar  observation  Y.  Under  hypothesis 
Ho,  Y  is  modeled  as  uniformly  distributed  over  [—5,5].  Under  Hi,  Y  has  conditional  density 
p(y |1)  =  |e_lyl/4,  —  oo  <  y  <  oo. 

(a)  Specify  the  ML  rule  and  clearly  draw  the  decision  regions  To  and  I\  on  the  real  line. 

(b)  Find  the  conditional  probabilities  of  error  for  the  ML  rule  under  each  hypothesis. 

Problem  6.9  For  the  setting  of  Problem  6.8,  suppose  that  the  prior  probability  of  Hq  is  1/3. 

(a)  Specify  the  MPE  rule  and  draw  the  decision  regions. 

(b)  Find  the  conditional  error  probabilities  and  the  average  error  probability.  Compare  with  the 
corresponding  quantities  for  the  ML  rule  considered  in  Problem  6.8. 


Problem  6.10  The  receiver  output  Z  in  an  on-off  keyed  optical  communication  system  is  mod¬ 
eled  as  a  Poisson  random  variable  with  mean  mo  =  1  if  0  is  sent,  and  mean  m\  =  10  if  1  is  sent. 

(a)  Show  that  the  ML  rule  consists  of  comparing  Z  to  a  threshold,  and  specify  the  numerical 
value  of  the  threshold.  Note  that  Z  can  only  take  nonnegative  integer  values. 

(b)  Compute  the  conditional  error  probabilities  for  the  ML  rule  (compute  numerical  values  in 
addition  to  deriving  formulas). 

(c)  Find  the  MPE  rule  if  the  prior  probability  of  sending  1  is  0.1. 

(d)  Compute  the  average  error  probability  for  the  MPE  rule. 


Problem  6.11  The  received  sample  Y  in  a  binary  communication  system  is  modeled  as  follows: 
Y  =  A  +  N  if  0  is  sent,  and  Y  =  —A  +  N  if  1  is  sent,  where  N  is  Laplacian  noise  with  density 

Pn(x)  =  ^ e-A ^  ,  —  oo  <  x  <  oo 


(a)  Find  the  ML  decision  rule.  Simplify  as  much  as  possible. 

(b)  Find  the  conditional  error  probabilities  for  the  ML  rule. 

(c)  Now,  suppose  that  the  prior  probability  of  sending  0  is  1/3.  Find  the  MPE  rule,  simplifying 
as  much  as  possible. 

(d)  In  the  setting  of  (c),  find  the  LLR  log  pjijyC^]  • 


Problem  6.12  Consider  binary  hypothesis  testing  with  scalar  observation  Y .  Under  hypothesis 
Hq,  Y  is  modeled  as  an  exponential  random  variable  with  mean  5.  Under  hypothesis  Hi,  Y  is 
modeled  as  uniformly  distributed  over  the  interval  [0, 10]. 

(a)  Specify  the  ML  rule  and  clearly  draw  the  decision  regions  To  and  Ti  on  the  real  line. 

(b)  Find  the  conditional  probability  of  error  for  the  ML  rule,  given  that  H0  is  true. 

(c)  Suppose  that  the  prior  probability  of  H0  is  1/3.  Compute  the  posterior  probability  of  H0 
given  that  we  observe  Y  —  4  (i.e.,  find  P[H0\Y  =  4]). 


Problem  6.13  Consider  hypothesis  testing  in  which  the  observation  Y  is  given  by  the  following 
model: 

Hi  :  Y  =  6  +  N 
Ho  :  Y  =  N 

where  the  noise  N  has  density  Pn{x)  —  -A  ^1  —  L[-io,io](^)- 

(a)  Find  the  conditional  error  probability  given  Hi  for  the  following  decision  rule: 

Hi 

Y>  4 
< 

H0 

(b)  Are  there  a  set  of  prior  probabilities  for  which  the  decision  rule  in  (a)  minimizes  the  error 
probability?  If  so,  specify  them.  If  not,  say  why  not. 
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Receiver  design  and  performance  analysis  for  the  AWGN  channel 

Problem  6.14  Consider  binary  signaling  in  AWGN,  with  s\(t)  =  (1  —  | )Z[_i;i] (t)  and  so(t)  = 

—Si(t).  The  received  signal  is  given  by  y(t)  =  s,(t)  +  n(t),  i  =  0, 1„  where  the  noise  n  has  PSD 
a2  =  ^  =  0.1.  For  all  of  the  error  probabilities  computed  in  this  problem,  specify  in  terms  of 
the  Q  function  with  positive  arguments  and  also  give  numerical  values. 

(a)  How  would  you  implement  the  ML  receiver  using  the  received  signal  y{t)7  What  is  its 
conditional  error  probability  given  that  So  is  sent? 

Now,  consider  a  suboptimal  receiver,  where  the  receiver  generates  the  following  decision  statistics: 

/— 0.5  rO  r0.5  rl 

y(t)dt,  yi=  y(t)dt,  y?=  y(t)dt ,  y0=  y(t)dt 

1  J  —0.5  Jo  J  0.5 

(b)  Specify  the  conditional  distribution  of  y  =  (y0,  yi,  y2,  y?)T ,  conditioned  on  s0  being  sent. 

(c)  Specify  the  ML  rule  when  the  observation  is  y.  What  is  its  conditional  error  probability 
given  that  s o  is  sent? 

(d)  Specify  the  ML  rule  when  the  observation  is  y0  +  yi  +  y2  +  y3-  What  is  its  conditional  error 
probability,  given  that  s0  is  sent? 

(e)  Among  the  error  probabilities  in  (a),  (c)  and  (d),  which  is  the  smallest?  Which  is  the  biggest? 
Could  you  have  rank  ordered  these  error  probabilities  without  actually  computing  them? 

Problem  6.15  The  received  signal  in  an  on-off  keyed  digital  communication  system  is  given  by 

f  s(t)  +  n(t)  1  sent 
=  {  n(t)  0  sent 

where  n  is  AWGN  with  PSD  a2  =  No/2,  and  s(t)  =  A(l  —  |f|)/[_1)1](f),  where  A  >  0.  The  received 
signal  is  passed  through  a  filter  with  impulse  response  h(t)  =  7[0)i ](t)  to  obtain  z{t)  —  {y  *  h)(t). 
Remark:  R  would  be  helpful  to  draw  a  picture  of  the  system  before  you  start  doing  the  calculations. 

(a)  Consider  the  decision  statistic  Z  =  2(0)  +  2(1).  Specify  the  conditional  distribution  of  Z 
given  that  0  is  sent,  and  the  conditional  distribution  of  Z  given  that  1  is  sent. 

(b)  Assuming  that  the  receiver  must  make  its  decision  based  on  Z,  specify  the  ML  rule  and  its 
error  probability  in  terms  of  Eb/N0  (express  your  answer  in  terms  of  the  Q  function  with  positive 
arguments) . 

(c)  Find  the  error  probability  (in  terms  of  E\,/Nq)  for  ML  decisions  based  on  the  decision  statistic 
Z2  =  z(0)  +  2(0.5)  +  2(1). 


Problem  6.16  Consider  binary  signaling  in  AWGN  using  the  signals  depicted  in  Figure  6.31. 
The  received  signal  is  given  by 


si(t)  +  n(t), 
So(t )  +n(t), 


1  sent 
0  sent 


where  n{t)  is  WGN  with  PSD  (j2  =  N0/2. 

(a)  Show  that  the  ML  decision  rule  can  be  implemented  by  comparing  Z  —  J  y(t)a{t)dt  to  a 
threshold  7.  Sketch  a(t)  and  specify  the  corresponding  value  of  7. 

(b)  Specify  the  error  probability  of  the  ML  rule  as  a  function  of  Eb/N0. 

(c)  Can  the  MPE  rule,  assuming  that  the  prior  probability  of  sending  0  is  1/3,  be  implemented 
using  the  same  receiver  structure  as  in  (a)?  What  would  need  to  change?  (Be  specific.) 

(d)  Consider  now  a  suboptimal  receiver  structure  in  which  y[t)  is  passed  through  a  filter  with 
impulse  response  h(t)  =  /[0ji](t),  and  we  take  three  samples:  Zl  —  {y  *  h)(  1),  Z2  =  {y  *  h)( 2), 
Z3  —  (y  *  h)i 3).  Specify  the  conditional  distribution  of  Z  =  (Zi,  Z2,  Z3)r  given  that  0  is  sent. 

(e)  (more  challenging)  Specify  the  ML  rule  based  on  Z  and  the  corresponding  error  probability 
as  a  function  of  Eb/No. 
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Sl(t) 

1 


1 


3 


t 


Figure  6.31:  Signal  Set  for  Problem  6.16. 


Problem  6.17  Let  P\{t)  =  I[o,i](t)  denote  a  rectangular  pulse  of  unit  duration.  Consider  two 
4-ary  signal  sets  as  follows: 

Signal  Set  A:  Si(t )  =  Pi(t  —  i ),  i  —  0,1,  2,  3. 

Signal  Set  B:  s0(t )  =  pi(t)  +  pi(t  -  3),  si(t)  =  pi(t  -  1)  +  pi(t  -  2),  s2(t)  =  pi(t)  +  pi(t  -  2), 
s3(t)  =  pi(t  -  1)  +pi{t  -  3). 

(a)  Find  signal  space  representations  for  each  signal  set  with  respect  to  the  orthonormal  basis 
{Pi(t-i),i  =  0,1,  2,  3}. 

(b)  Find  union  bounds  on  the  average  error  probabilities  for  both  signal  sets  as  a  function  of 
Eb/No.  At  high  SNR,  what  is  the  penalty  in  dB  for  using  signal  set  B? 

(c)  Find  an  exact  expression  for  the  average  error  probability  for  signal  set  B  as  a  function  of 
Eb/N0. 


Figure  6.32:  Signal  Set  for  Problem  6.18 


Problem  6.18  Consider  the  4-ary  signaling  set  shown  in  Figure  6.32,  to  be  used  over  an  AWGN 
channel. 

(a)  Find  a  union  bound,  as  a  function  of  Eb/No,  on  the  conditional  probability  of  error  given 
that  c(t)  is  sent. 

(b)  True  or  False  This  constellation  is  more  power  efficient  than  QPSK.  Justify  your  answer. 
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•  • 


•  • 


•  • 


QAM1 


QAM2 


Figure  6.33:  Signal  constellations  for  Problem  6.19 


Problem  6.19  Three  8-ary  signal  constellations  are  shown  in  Figure  6.33. 

(a)  Express  R  and  d^in  in  terms  of  d^in  so  that  all  three  constellations  have  the  same  Eb. 

(b)  For  a  given  Eb/No,  which  constellation  do  you  expect  to  have  the  smallest  bit  error  probability 
over  a  high  SNR  AWGN  channel? 

(c)  For  each  constellation,  determine  whether  you  can  label  signal  points  using  3  bits  so  that  the 
label  for  nearest  neighbors  differs  by  at  most  one  bit.  If  so,  find  such  a  labeling.  If  not,  say  why 
not  and  find  some  “good”  labeling. 

(d)  For  the  labelings  found  in  part  (c),  compute  nearest  neighbors  approximations  for  the  average 
bit  error  probability  as  a  function  of  Eb/N$  for  each  constellation.  Evaluate  these  approximations 
for  Eb/N0  =  15  dB. 

Problem  6.20  Consider  the  signal  constellation  shown  in  Figure  6.34,  which  consists  of  two 
QPSK  constellations  of  different  radii,  offset  from  each  other  by  The  constellation  is  to  be 
used  to  communicate  over  a  passband  AWGN  channel. 


Figure  6.34:  Constellation  for  Problem  6.20 


(a)  Carefully  redraw  the  constellation  (roughly  to  scale,  to  the  extent  possible)  for  r  =  1  and 
R  =  \/2.  Sketch  the  ML  decision  regions. 

(b)  For  r  =  1  and  R  =  \/2,  find  an  intelligent  union  bound  for  the  conditional  error  probability, 
given  that  a  sign  al  point  from  the  inner  circle  is  sent,  as  a  function  of  Eb/N0. 

(c)  How  would  you  choose  the  parameters  r  and  R  so  as  to  optimize  the  power  efficiency  of  the 
constellation  (at  high  SNR  )? 
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Problem  6.21  (Exact  symbol  error  probabilities  for  rectangular  constellations)  As¬ 
suming  each  symbol  is  equally  likely,  derive  the  following  expressions  for  the  average  error  prob¬ 
ability  for  4PAM  and  16QAM: 


Pe  =  -Q 


4  Eb 

5  An 


symbol  error  probability  for  4PAM 


(6.90) 


Pe  =  3Q 


4  Eh 


5A0  I  4^ 


4  Eb 

5  No 


symbol  error  probability  for  16QAM  (6.91) 


(Assume  4PAM  with  equally  spaced  levels  symmetric  about  the  origin,  and  rectangular  16QAM 
equivalent  to  two  4PAM  constellations  independently  modulating  the  I  and  Q  components.) 


•  • 


•  • 


•  • 


Figure  6.35:  Constellation  for  Problem  6.22 


Problem  6.22  The  signal  constellation  shown  in  Figure  6.35  is  obtained  by  moving  the  outer 
corner  points  in  rectangular  16QAM  to  the  I  and  Q  axes. 

(a)  Sketch  the  ML  decision  regions. 

(b)  Is  the  constellation  more  or  less  power-efficient  than  rectangular  16QAM? 

Problem  6.23  Consider  a  16-ary  signal  constellation  with  4  signals  with  coordinates  (±1,  ±1), 
four  others  with  coordinates  (±3,  ±3),  and  two  each  having  coordinates  (±3,  0),  (±5,  0),  (0,  ±3), 
and  (0,±5),  respectively. 

(a)  Sketch  the  signal  constellation  and  indicate  the  ML  decision  regions. 

(b)  Find  an  intelligent  union  bound  on  the  average  symbol  error  probability  as  a  function  of 
Eb/N0. 

(c)  Find  the  nearest  neighbors  approximation  to  the  average  symbol  error  probability  as  a  func¬ 
tion  of  Eb/N0. 

(d)  Find  the  nearest  neighbors  approximation  to  the  average  symbol  error  probability  for  16QAM 
as  a  function  of  Eb/No. 

(e)  Comparing  (c)  and  (d)  (i.e.,  comparing  the  performance  at  high  SNR),  which  signal  set  is 
more  power  efficient? 


Problem  6.24  A  QPSK  demodulator  is  designed  to  put  out  an  erasure  when  the  decision  is 
ambivalent.  Thus,  the  decision  regions  are  modified  as  shown  in  Figure  6.36,  where  the  cross- 
hatched  region  corresponds  to  an  erasure.  Set  a  =  //-,  where  0  <  a  <  1. 

(a)  Use  the  intelligent  union  bound  to  find  approximations  to  the  probability  p  of  symbol  error 
and  the  probability  q  of  symbol  erasure  in  terms  of  Eb/No  and  a. 

(b)  Find  exact  expressions  for  p  and  q  as  functions  of  Eb/N0  and  a. 
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•  • 

*  *•$!"  * 

Figure  6.36:  QPSK  with  erasures 


(c)  Using  the  approximations  in  (a),  find  an  approximate  value  for  a  such  that  q  =  2p  for 
Eb/N0  =  MB. 

Remark:  The  motivation  for  (c)  is  that  a  typical  error-correcting  code  can  correct  twice  as  many 
erasures  as  errors. 


Figure  6.37:  Constellation  for  Problem  6.25 


Problem  6.25  The  constellation  shown  in  Figure  6.37  consists  of  two  QPSK  constellations  lying 
on  concentric  circles,  with  inner  circle  of  radius  r  and  outer  circle  of  radius  R. 

(a)  For  r  =  1  and  R  —  2,  redraw  the  constellation,  and  carefully  sketch  the  ML  decision  regions. 

(b)  Still  keeping  r  =  1  and  R  =  2,  find  an  intelligent  union  bound  for  the  symbol  error  probability 
as  a  function  of  Eb/N0. 

(c)  For  r  =  1,  find  the  best  choice  of  R  in  terms  of  high  SNR  performance.  Compute  the  gain  in 
power  efficiency  (in  dB),  if  any,  over  the  setting  in  (a)-(b). 

Problem  6.26  Consider  the  constant  modulus  constellation  shown  in  Figure  6.38.  where  9  < 
7t/4.  Each  symbol  is  labeled  by  2  bits  (61,62)  as  shown.  Assume  that  the  constellation  is  used 
over  a  complex  baseband  AWGN  channel  with  noise  Power  Spectral  Density  (PSD)  N0/2  in  each 

dimension.  Let  (61,62)  denote  the  maximum  likelihood  (ML)  estimates  of  (61,62). 

(a)  Find  Pei  =  P[6i  7^  61]  and  Pe2  =  P[62  7^  62]  as  a  function  of  Es/N0,  where  Es  denotes  the 
signal  energy. 

(b)  Assume  now  that  the  transmitter  is  being  heard  by  two  receivers,  PI  and  P2,  and  that  R2  is 
twice  as  far  away  from  the  transmitter  as  PI.  Assume  that  the  received  signal  energy  falls  off  as 
1/r4,  where  r  is  the  distance  from  the  transmitter,  and  that  the  noise  PSD  for  both  receivers  is 
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(0,0) 


(0,1) 


*(1.0) 


UP 


Figure  6.38:  Signal  constellation  with  unequal  error  protection  (Problem  6.26). 


identical.  Suppose  that  R1  can  demodulate  both  bits  61  and  62  with  error  probability  at  least  as 
good  as  10-3,  i.e.,  so  that  max{Pel(i?l),  Pe2(Rl)}  =  10-3.  Design  the  signal  constellation  (i.e., 
specify  9)  so  that  R2  can  demodulate  at  least  one  of  the  bits  with  the  same  error  probability, 
i.e.,  such  that  min{Pei(i?2),  Pe2(R2)}  =  10-3. 

Remark:  You  have  designed  an  unequal  error  protection  scheme  in  which  the  receiver  that  sees 
a  poorer  channel  can  still  extract  part  of  the  information  sent. 
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Figure  6.39:  Constellation  for  Problem  6.27 


Problem  6.27  The  2-dimensional  constellation  shown  in  Figure  6.39  is  to  be  used  for  signaling 
over  an  AWGN  channel. 

(a) Specify  the  ML  decision  if  the  observation  is  (I,Q)  =  (1,  —1). 

(b)  Carefully  redraw  the  constellation  and  sketch  the  ML  decision  regions. 

(c)  Find  an  intelligent  union  bound  for  the  symbol  error  probability  conditioned  on  So  being  sent, 
as  a  function  of  Eb/N0. 

Problem  6.28  (Demodulation  with  amplitude  mismatch)  Consider  a  4PAM  system  us¬ 
ing  the  constellation  points  {±1,±3}.  The  receiver  has  an  accurate  estimate  of  its  noise  level. 
An  automatic  gain  control  (AGC)  circuit  is  supposed  to  scale  the  decision  statistics  so  that  the 
noiseless  constellation  points  are  in  {±1,  ±3}.  ML  decision  boundaries  are  set  according  to  this 
nominal  scaling. 

(a)  Suppose  that  the  AGC  scaling  is  faulty,  and  the  actual  noiseless  signal  points  are  at  {±0.9,  ±2.7} 
Sketch  the  points  and  the  mismatched  decision  regions.  Find  an  intelligent  union  bound  for  the 
symbol  error  probability  in  terms  of  the  Q  function  and  E\,/Nq. 
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(b)  Repeat  (a),  assuming  that  faulty  AGC  scaling  puts  the  noiseless  signal  points  at  {±1.1,  ±3.3}. 

(c)  AGC  circuits  try  to  maintain  a  constant  output  power  as  the  input  power  varies,  and  can  be 
viewed  as  imposing  a  scale  factor  on  the  input  inversely  proportional  to  the  square  root  of  the 
input  power.  In  (a),  does  the  AGC  circuit  overestimate  or  underestimate  the  input  power? 

Problem  6.29  (Demodulation  with  phase  mismatch)  Consider  a  BPSK  system  in  which 
the  receiver’s  estimate  of  the  carrier  phase  is  off  by  6. 

(a)  Sketch  the  I  and  Q  components  of  the  decision  statistic,  showing  the  noiseless  signal  points 
and  the  decision  region. 

(b)  Derive  the  BER  as  a  function  of  6  and  Eb/No  (assume  that  6  <  |). 

(c)  Assuming  now  that  6  is  a  random  variable  taking  values  uniformly  in  [— f,f],  numerically 
compute  the  BER  averaged  over  9,  and  plot  it  as  a  function  of  Eb/N0.  Plot  the  BER  without 
phase  mismatch  as  well,  and  estimate  the  dB  degradation  due  to  the  phase  mismatch. 

Problem  6.30  (Simplex  signaling  set)  Let  so(t), ...,  SM-i{t)  denote  a  set  of  equal  energy, 
orthogonal  signals.  Construct  a  new  M-ary  signal  set  from  these  as  follows,  by  subtracting  out 
the  average  of  the  M  signals  from  each  signal  as  follows: 


M—  1 


3=0 


This  is  called  the  simplex  signaling  set. 

(a)  Find  a  union  bound  on  the  symbol  error  probability,  as  a  function  of  Eb/No  and  M,  for 
signaling  over  the  AWGN  channel  using  the  signal  set  {uk(t),  k  =  0,1, M  —  1}. 

(b)  Compare  the  power  efficiencies  of  the  simplex  and  orthogonal  signaling  sets  for  a  given  M, 
and  use  these  to  estimate  the  performance  difference  in  dB  between  these  two  signaling  schemes, 
for  M  —  4,8, 16,  32.  What  happens  as  M  gets  large? 

(c)  Use  computer  simulations  to  plot,  for  M  =  4,  the  error  probability  (log  scale)  versus  Eb/N0 
(dB)  of  the  simplex  signaling  set  and  the  corresponding  orthogonal  signaling  set.  Are  your  results 
consistent  with  the  prediction  from  (b)? 

Problem  6.31  (Soft  decisions  for  BPSK)  Consider  a  BPSK  system  in  which  0  and  1  are 
equally  likely  to  be  sent,  with  0  mapped  to  +1  and  1  to  -1  as  usual.  Thus,  the  decision  statistic 
Y  =  A  +  N  if  0  is  sent,  and  Y  =  —A  +  N  if  1  is  sent,  where  A  >  0  and  N  ~  iV(0,  a2). 

(a)  Show  that  the  LLR  is  conditionally  Gaussian  given  the  transmitted  bit,  and  that  the  condi¬ 
tional  distribution  is  scale- invariant,  depending  only  on  Eb/No. 

(b)  If  the  BER  for  hard  decisions  is  10%,  specify  the  conditional  distribution  of  the  LLR,  given 
that  0  is  sent. 

Problem  6.32  (Soft  decisions  for  PAM)  Consider  soft  decisions  for  4PAM  signaling  as  in 
Example  6.1.3.  Assume  that  the  signals  have  been  scaled  to  ±1,  ±3  (i.e. ,  set  A  =  1  in  Example 
6.1.3.  The  system  is  operating  at  Eb/N0  of  6  dB.  Bits  b\,  b2  G  {0, 1}  are  mapped  to  the  symbols 
using  Gray  coding.  Assume  that  (b\,b2)  =  (0,0)  for  symbol  -3,  and  (1,0)  for  symbol  +3. 

(a)  Sketch  the  constellation,  along  with  the  bit  maps.  Indicate  the  ML  hard  decision  boundaries. 

(b)  Find  the  posterior  symbol  probability  P[—3\y]  as  a  function  of  the  noisy  observation  y.  Plot 
it  as  a  function  of  y. 

Hint:  The  noise  variance  a2  can  be  inferred  from  the  signal  levels  and  SNR. 

(c)  Find  P[bi  =  l|y]  and  P[b2  =  1||/],  and  plot  as  a  function  of  y. 

Remark:  The  posterior  probability  of  b\  =  1  equals  the  sum  of  the  posterior  probabilities  of  all 
symbols  which  have  b\  =  1  in  their  labels. 

(d)  Display  the  results  of  part  (c)  in  terms  of  LLRs. 
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Plot  the  LLRs  as  a  function  of  y,  saturating  the  values  as  ±50. 

(e)  Try  other  values  of  Eb/No  (e.g.,  0  dB,  10  dB).  Comment  on  any  trends  you  notice.  How  do 
the  LLRs  vary  as  a  function  of  distance  from  the  noiseless  signal  points?  How  do  they  vary  as 
you  change  Eb/N0. 

(f)  In  order  to  characterize  the  conditional  distribution  of  the  LLRs,  simulate  the  system  over 
multiple  symbols  at  Eb/N0  such  that  the  BER  is  about  5%.  Plot  the  histograms  of  the  LLRs 
for  each  of  the  two  bits,  and  comment  on  whether  they  look  Gaussian.  What  happens  as  you 
increase  or  decrease  Eb/N07 


Problem  6.33  (M- ary  orthogonal  signaling  performance  as  M  — )■  oo)  We  wish  to  derive 
the  result  that 


lim  P (correct) 

M— *■  oo 


1  §  >ln2 
0  fell, 2 


(6.92) 


(a)  Show  that 


P(correct) 


$ 


2  Eb  log2  M 


iVn 


M—l 


(b)  Show  that,  for  any  x, 


lim 

M—>  oo 


2Eb  log2  M 


Nn 


M—l 


0  f  <ln2 
1  #  >  In  2 


Hint:  Use  L’Hospital’s  rule  on  the  log  of  the  expression  whose  limit  is  to  be  evaluated, 

(c)  Substitute  (b)  into  the  integral  in  (a)  to  infer  the  desired  result. 


Problem  6.34  (Effect  of  Rayleigh  fading)  Constructive  and  destructive  interference  between 
multiple  paths  in  wireless  systems  lead  to  large  fluctuations  in  received  amplitude,  modeled  as  a 
Rayleigh  random  variable  A  (see  Problem  5.21  for  a  definition).  The  energy  per  bit  is  therefore 
proportional  to  A2,  which,  using  Problem  5.21(c),  is  an  exponential  random  variable.  Thus, 
we  can  model  Eb/N0  as  an  exponential  random  variable  with  mean  Eb/N0,  where  Eb  is  the 
average  energy  per  bit.  Simplify  notation  by  setting  =  X,  and  the  mean  Eb/N0  =  -,  so  that 

iVO  /4 

x  Exp(n). 

(a)  Show  that  the  average  error  probability  for  BPSK  with  Rayleigh  fading  can  be  written  as 


Hint:  The  error  probability  for  BPSK  is  given  by  Q  where  Eb/N0  is  a  random  variable. 

We  now  find  the  expected  error  probability  by  averaging  over  the  distribution  of  Eb/No. 

(b)  Integrating  by  parts  and  simplifying,  show  that  the  average  error  probability  can  be  written 
as 

p‘  =  \  (:  ~ (1  +  '‘G)  =  H1  “ (1  +  ±r4) 

Hint:  Q(x)  is  defined  via  an  integral,  so  we  can  End  its  derivative  (when  integrating  by  parts) 
using  the  fundamental  theorem  of  calculus. 

(c)  Using  the  approximation  that  (1  ±  a)b  ~  1  ±  ba  for  |a|  small,  show  that 


4(E6/iVo) 
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at  high  SNR.  Comment  on  how  this  decay  of  error  probability  with  the  reciprocal  of  SNR 
compares  with  the  decay  for  the  AWGN  channel. 

(b)  Plot  the  error  probability  versus  for  BPSK  over  the  AWGN  and  Rayleigh  fading  channels 

(BER  on  log  scale,  in  dB).  Note  that  E &  =  Ef,  for  the  AWGN  channel.  At  BER  of  10~3,  what 
is  the  degradation  in  dB  due  to  Rayleigh  fading? 


Link  budget  analysis 

Problem  6.35  You  are  given  an  AWGN  channel  of  bandwidth  3  MHz.  Assume  that  implemen¬ 
tation  constraints  dictate  an  excess  bandwidth  of  50%.  Find  the  achievable  bit  rate,  the  E^/Nq 
required  for  a  BER  of  10~8,  and  the  receiver  sensitivity  (assuming  a  receiver  noise  figure  of  7 
dB)  for  the  following  modulation  schemes,  assuming  that  the  bit-to-symbol  map  is  optimized  to 
minimize  the  BER  whenever  possible: 

(a)  QPSK,  (b)  8PSK,  (c)  64QAM  (d)  Coherent  16-ary  orthogonal  signaling. 

Remark:  Use  nearest  neighbors  approximations  for  the  BER. 

Problem  6.36  Consider  the  setting  of  Example  6.5.1. 

(a)  For  all  parameters  remaining  the  same,  find  the  range  and  bit  rate  when  using  a  64QAM 
constellation. 

(b)  Suppose  now  that  the  channel  model  is  changed  from  AWGN  to  Rayleigh  fading  (see  Problem 
6.34).  Find  the  receiver  sensitivity  required  for  QPSK  at  BER  of  10~5.  (In  practice,  we  would 
shoot  for  a  higher  uncoded  BER,  and  apply  channel  coding,  but  we  discuss  such  methods  in  later 
chapters.)  What  is  the  range,  assuming  all  other  parameters  are  as  in  Example  6.5.1?  How  does 
the  range  change  if  you  reduce  the  link  margin  to  10  dB  (now  that  fading  is  being  accounted  for, 
there  are  fewer  remaining  uncertainties). 

Problem  6.37  Consider  a  line-of-sight  communication  link  operating  in  the  60  GHz  band  (where 
large  amounts  of  unlicensed  bandwidth  have  been  set  aside  by  regulators).  From  version  1  of 
the  Friis  formula  (6.83),  we  see  that  the  received  power  scales  as  A2,  and  hence  as  the  inverse 
square  of  the  carrier  frequency,  so  that  60  GHz  links  have  much  worse  propagation  than,  say,  5 
GHz  links  when  antenna  gains  are  fixed.  However,  from  (6.82),  we  see  that  the  we  can  get  much 
better  antenna  gains  at  small  carrier  wavelengths  for  a  fixed  form  factor,  and  version  2  of  the 
Friis  formula  (6.84)  shows  that  the  received  power  scales  as  1/A2,  which  improves  with  increasing 
carrier  frequency.  Furthermore,  electronically  steerable  antenna  arrays  with  high  gains  can  be 
implemented  with  compact  form  factor  (e.g.,  patterns  of  metal  on  circuit  board)  at  higher  carrier 
frequencies  such  as  60  GHz.  Suppose,  now,  that  we  wish  to  design  a  2  Gbps  link  using  QPSK 
with  an  excess  bandwidth  of  50%.  The  receiver  noise  figure  is  8  dB,  and  the  desired  link  margin 
is  10  dB. 

(a)  What  is  the  transmit  power  in  dBm  required  to  attain  a  range  of  10  meters  (e.g.,  for  in- room 
communication),  assuming  that  the  transmit  and  receive  antenna  gains  are  each  10  dBi? 

(b)  For  a  transmit  power  of  20  dBm,  what  are  the  antenna  gains  required  at  the  transmitter  and 
receiver  (assume  that  the  gains  at  both  ends  are  equal)  to  attain  a  range  of  200  meters  (e.g.,  for 
an  outdoor  last-hop  link)? 

(c)  For  the  antenna  gains  found  in  (b),  what  happens  to  the  attainable  range  if  you  account  for 
additional  path  loss  due  to  oxygen  absorption  (typical  in  the  60  GHz  band)  of  16  dB/krn? 

(d)  In  (c),  what  happens  to  the  attainable  range  if  there  is  a  further  path  loss  of  30  dB/km  due 
to  heavy  rain  (on  top  of  the  loss  due  to  oxygen  absorption)? 

Problem  6.38  A  10  Mbps  line-of-sight  communication  link  operating  at  a  carrier  frequency  of 
1  GHz  has  a  designed  range  of  5  km.  The  link  employs  16QAM  with  an  excess  bandwidth  of 
25%,  with  a  designed  BER  of  10~6  and  a  link  margin  of  10  dB.  The  receiver  noise  figure  is  4 
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dB,  and  the  transmit  and  receive  antenna  gains  are  10  dBi  each.  This  is  the  baseline  scenario 
against  which  each  of  the  scenarios  in  (a) -(c)  are  to  be  compared. 

(a)  Suppose  that  you  change  the  carrier  frequency  to  5  GHz,  keeping  all  other  link  parameters 
the  same.  What  is  the  new  range? 

(b)  Suppose  that  you  change  the  carrier  frequency  to  5  GHz  and  increase  the  transmit  and  receive 
antenna  gains  by  3  dBi  each,  keeping  all  other  link  parameters  the  same.  What  is  the  new  range? 

(c)  Suppose  you  change  the  carrier  frequency  to  5  GHz,  increase  the  transmit  and  receive  antenna 
directivities  by  3  dBi  each,  and  increase  the  data  rate  to  40  Mbps,  still  using  16QAM  with  excess 
bandwidth  of  25%.  All  other  link  parameters  are  the  same.  What  is  the  new  range? 


Software  Lab  6.1:  Linear  modulation  with  two-dimensional 
constellations 

Lab  Objectives:  This  is  a  follow-on  to  Software  Lab  4.1,  the  code  from  which  is  our  starting 
point  here.  The  objective  is  to  implement  in  complex  baseband  a  linearly  modulated  system  for 
a  variety  of  signal  constellations.  We  wish  to  estimate  the  performance  of  these  schemes  for  an 
ideal  channel  via  simulation,  and  to  compare  with  analytical  expressions.  As  in  Software  Lab 
4-1,  we  use  a  trivial  channel  filter  in  this  lab.  The  model  is  extended  to  dispersive  channels  in 
Software  Lab  8.1  in  Chapter  8. 

Reading:  Basic  modulation  formats  from  Chapter  4,  and  error  probability  expressions  from 
Chapter  6. 


Laboratory  Assignment 

0)  Use  the  code  for  Software  Lab  4.1  as  a  starting  point. 

1)  Write  a  matlab  function  randbit  that  generates  random  bits  taking  values  in  {0, 1}  (not  ±1) 
with  equal  probability. 

2)  Write  the  following  functions  mapping  bits  to  symbols  for  different  signal  constellations.  Write 
the  functions  to  allow  for  vector  inputs  and  outputs.  The  mapping  is  said  to  be  a  Gray 
code,  or  Gray  labeling,  if  the  bit  map  for  nearest  neighbors  in  the  constellation  differ  by  exactly 
one  bit.  In  all  of  the  following,  choose  the  bit  map  to  be  a  Gray  code. 

(a)  bpskmap:  input  a  0/1  bit,  output  a  ±1  bit. 

(b)  qpskmap:  input  2  0/1  bits,  output  a  symbol  taking  one  of  four  values  in  ±1  ±  j. 

(c)  fourpammap:  input  2  0/1  bits,  output  a  symbol  taking  one  of  four  values  in  {±1,  ±3}. 

(d)  sixteenqammap:  input  4  0/1  bits,  output  a  symbol  taking  one  of  16  values  in  { bc  +  jbs  : 
bc,  bs  E  {±1,  ±3}}. 

(e)  eightpskmap:  input  3  0/1  bits,  output  a  symbol  taking  one  of  8  values  in  ej27r*/8,  i  =  0, 1, ...,  7. 

3)  BPSK  symbol  generation:  Use  part  1  to  generate  12000  0/1  bits.  Map  these  to  BPSK  (±1) 
bits  using  bpskmap.  Pass  these  through  the  transmit  and  receive  filter  in  lab  1  to  get  noiseless 
received  samples  at  rate  4/T,  as  before. 

4)  Adding  noise:  We  consider  discrete  time  additive  white  Gaussian  noise  (AWGN).  At  the  input 
to  the  receive  filter,  add  independent  and  identically  distributed  (iid)  complex  Gaussian  noise, 
such  that  the  real  and  imaginary  part  of  each  sample  are  iid  A(0,cr2)  (you  will  choose  a2  = 
corresponding  to  a  specified  value  of  as  described  in  part  5.  Pass  these  (rate  4/T)  noise 
samples  through  the  receive  filter,  and  add  the  result  to  the  output  of  part  3. 

Remark:  If  the  nth  transmitted  symbol  is  b[n],  the  average  received  energy  per  symbol  is 
Es  =  T[|6[n]|2]||g'T  *  9c\\2-  Divide  that  by  the  number  of  bits  per  symbol  to  get  Eb.  The 
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noise  variance  per  dimension  is  a2  =  This  enables  you  to  compute  Eb/N0  for  your  simula¬ 
tion  model.  The  signal-to-noise  ratio  Eb/N0  is  usually  expressed  in  decibels  (dB):  Eb/N0(dB)  = 
10  log w  Eb/N0(raw).  Thus,  if  you  fix  the  transmit  and  channel  filter  coefficients,  then  you  can 
simulate  any  given  value  of  Eb/N0  in  dB  by  varying  the  value  of  the  noise  variance  cr2. 

5)  Plot  the  ideal  bit  error  probability  for  BPSK,  which  is  given  by  Q(^/2Eb/N0),  on  a  log  scale 
as  a  function  of  Eb/No  in  dB  over  the  range  0-10  dB.  Find  the  value  of  Eb/No  that  corresponds 
to  an  error  probability  of  10~2. 

6)  For  the  value  of  Eb/N0  found  in  part  5,  choose  the  corresponding  value  of  cr2  in  part  1.  Find 
the  decision  statistics  corresponding  to  the  transmitted  symbols  at  the  input  and  output  of  the 
receive  filter,  as  in  lab  1  (parts  5  and  6).  Plot  the  imaginary  versus  the  real  parts  of  the  decision 
statistics;  you  should  see  a  noisy  version  of  the  constellation. 

7)  Using  an  appropriate  decision  rule,  make  decisions  on  the  12000  transmitted  bits  based  on 
the  12000  decision  statistics,  and  measure  the  error  probability  obtained  at  the  input  and  the 
output.  Compare  the  results  with  the  ideal  error  probability  from  part  5.  You  should  find  that 
the  error  probability  based  on  the  receiver  input  samples  is  significantly  worse  than  that  based 
on  the  receiver  output,  and  that  the  latter  is  a  little  worse  than  the  ideal  performance  because 
of  the  ISI  in  the  decision  statistics. 

8)  Now,  map  12000  0/1  bits  into  6000  4PAM  symbols  using  function  fourpammap  (use  as  input 

2  parallel  vectors  of  6000  bits).  As  shown  in  Chapter  6,  a  good  approximation  (the  nearest 
neighbors  approximation)  to  the  ideal  bit  error  probability  for  Gray  coded  4PAM  is  given  by 

Q  .  As  in  part  5),  plot  this  on  a  log  scale  as  a  function  of  Eb/N0  in  dB  over  the  range 

0-10  dB.  What  is  the  value  of  Eb/N0  (dB)  corresponding  to  a  bit  error  probability  of  1CT2? 

9)  Choose  the  value  of  the  noise  variance  cr2  corresponding  to  the  Eb/No  found  in  part  7.  Now, 
find  decision  statistics  for  the  6000  transmitted  symbols  based  on  the  receive  filter  output  only. 

(a)  Plot  the  imaginary  versus  the  real  parts  of  the  decision  statistics,  as  before. 

(b)  Determine  an  appropriate  decision  rule  for  estimating  the  two  parallel  bit  streams  of  6000 
bits  from  the  6000  complex  decision  statistics. 

(c)  Measure  the  bit  error  probability,  and  compare  it  with  the  ideal  bit  error  probability. 

10)  Repeat  parts  8  and  9  for  QPSK,  the  ideal  bit  error  probability  for  which,  as  a  function  of 
Eb/No ,  is  the  same  as  for  BPSK. 

11)  Repeat  parts  8  and  9  for  16QAM  (4  bit  streams  of  length  3000  each),  the  ideal  bit  error 

probability  for  which,  as  a  function  of  Eb/No ,  is  the  same  as  for  4PAM. 

12)  Repeat  parts  8  and  9  for  8PSK  (3  bit  streams  of  length  4000  each).  The  ideal  bit  error 
probability  for  Gray  coded  8PSK  is  approximated  by  (using  the  nearest  neighbors  approximation) 


13)  Since  all  your  answers  above  will  be  off  from  the  ideal  answers  because  of  some  ISI,  run  a 
simulation  with  12000  bits  sent  using  Gray-coded  16-QAM  with  no  ISI.  To  do  this,  generate  the 
decision  statistics  by  adding  noise  directly  to  the  transmitted  symbols,  setting  the  noise  variance 
appropriately  to  operate  at  the  required  Eb/N0.  Do  this  for  two  different  values  of  Eb/N0,  the  one 
in  part  11  and  a  value  3  dB  higher.  In  each  case,  compare  the  nearest  neighbors  approximation 
to  the  measured  bit  error  probability,  and  plot  the  imaginary  versus  real  part  of  the  decision 
statistics. 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encountered. 

Tips:  Vectorize  as  many  of  the  functions  as  possible,  including  both  the  bit-to-symbol  maps  and 
the  decision  rules.  Do  BPSK  and  4-PAM  first,  where  you  will  only  use  the  real  part  of  the  complex 
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decision  statistics.  Leverage  this  for  QPSK  and  16-QAM,  by  replicating  what  you  did  for  the 
imaginary  part  of  the  decision  statistics  as  well.  To  avoid  confusion,  keep  different  matlab  hies 
for  simulations  regarding  different  signal  constellations,  and  keep  the  analytical  computations 
and  plots  separate  from  the  simulations. 


Software  Lab  6.2:  Modeling  and  performance  evaluation 
on  a  wireless  fading  channel 

Lab  Objectives:  Introduction  to  statistical  modeling  and  performance  evaluation  for  signaling 
on  wireless  fading  channels. 


Laboratory  Assignment 


Let  us  consider  the  following  simple  model  of  a  wireless  channel  (obtained  after  filtering  and 
sampling  at  the  symbol  rate,  and  assuming  that  there  is  no  ISI).  If  {6[rz] }  is  the  transmitted 
symbol  sequence,  then  the  complex-valued  received  sequence  is  given  by 

y[n\  =  h[n\b[n\  +  w[n\  (6.93) 

where  {w[n}  =  wc[n]  +  jws[n]}  is  an  iid  complex  Gaussian  noise  sequence  with  wc[n],  ws[n]  i.i.d. 
N(0,  a2  =  random  variables.  We  say  that  w[n]  has  variance  a2  per  dimension.  The  channel 
sequence  {h[n]}  is  a  time- varying  sequence  of  complex  gains. 

Equation  (6.93)  models  the  channel  at  a  given  time  as  a  simple  scalar  gain  h[n] .  On  the  other 
hand,  as  discussed  in  Example  2.5.6,  a  multipath  wireless  channel  cannot  be  modeled  as  a  simple 
scalar  gain:  it  is  dispersive  in  time,  and  exhibits  frequency  selectivity.  However,  it  is  shown  in 
Chapter  8  that  we  can  decompose  complicated  dispersive  channels  into  scalar  models  by  using 
frequency-domain  modulation,  or  OFDM,  which  transmits  data  in  parallel  over  narrow  enough 
frequency  slices  such  that  the  channel  over  each  slice  can  be  modeled  as  a  complex  scalar. 
Equation  (6.93)  could  therefore  be  interpreted  as  modeling  time  variations  in  such  scalar  gains. 

Rayleigh  fading:  The  channel  gain  sequence  {h[n]  =  hc[n]  +  jhs[n]},  where  {hc[n]}  and  {hs[n]} 
are  zero  mean,  independent  and  identically  distributed  colored  Gaussian  random  processes.  The 
reason  this  is  called  Rayleigh  fading  is  that  | h[n]  \  =  y J h2  [n]  +  h2  [n]  is  a  Rayleigh  random  variable. 
Rem, ark:  The  Gaussianity  arises  because  the  overall  channel  gain  results  from  a  superposition  of 
gains  from  multiple  reflections  off  scatterers. 

Simulation  of  Rayleigh  fading:  We  will  use  a  simple  model  wherein  the  colored  channel  gain 
sequence  {h[n]}  is  obtained  by  passing  white  Gaussian  noise  through  a  first-order  recursive  filter, 
as  follows: 

hc[n]  =  phc[n-  1]  +  u[n]  .  , 

hs[n]  =  phs[n  —  1]  +  v[n]  ^  '  ’ 

where  {u[n]}  and  {u[n]}  are  independent  real- valued  white  Gaussian  sequences,  with  i.i.d.  A(0,  /32) 
elements.  The  parameter  p  (0  <  p  <  1)  determines  how  rapidly  the  channel  varies.  The  model  for 
I  and  Q  gains  in  (6.94)  are  examples  of  first-order  autoregressive  (AR(1 ))  random  processes:  au¬ 
toregressive  because  future  values  depend  on  the  past  in  a  linear  fashion,  and  first  order  because 
only  the  immediately  preceding  value  affects  the  current  one. 

Setting  up  the  fading  simulator 

1)  Set  up  the  AR(1)  Rayleigh  fading  model  in  matlab,  with  p  and  /3 2  as  programmable  parameters. 
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2)  Calculate  E[|/i[n]  |2]  =  2E[/i2[n]]  =  2v2  analytically  as  a  function  of  p  and  /32.  Use  simulation 
to  verify  your  results,  setting  p  =  .99  and  /3  =  .01.  You  may  choose  to  initialize  hc[0]  and  hs[0] 
as  iid  N(0,v2)  in  your  simulation.  Use  at  least  10,000  samples. 

3)  Plot  the  instantaneous  channel  power  relative  to  the  average  channel  power,  in  dB  as 

a  function  of  n.  Thus,  0  dB  corresponds  to  the  average  value  of  2v2.  You  will  occasionally  see 
sharp  dips  in  the  power,  which  are  termed  deep  fades. 

4)  Define  the  channel  phase  6[n]  =  angle(h[n])  =  tan_1^M.  Plot  9[n\  versus  n.  Compare  with 
the  plot  in  part  3;  you  should  see  sharp  phase  changes  corresponding  to  deep  fades. 

QPSK  in  Rayleigh  fading 

Now,  implement  the  model  (6.93),  where  {&[n]}  correspond  to  Gray  coded  QPSK,  using  an  AR(1) 
simulation  of  Rayleigh  fading  as  in  (a).  Assume  that  the  receiver  has  perfect  knowledge  of  the 
channel  gains  {h[n]},  and  employs  the  decision  statistic  Z[n\  =  h*[n]y[n]. 

Remark:  In  practice,  the  channel  estimation  required  for  implementing  this  is  achieved  by  insert¬ 
ing  pilot  symbols  periodically  into  the  data  stream.  The  performance  will,  of  course,  be  worse 
than  with  the  ideal  channel  estimates  considered  here. 

5)  Do  scatter  plots  of  the  two-dimensional  received  symbols  {y[n]\,  and  of  the  decision  statistics 
{Z[n]}.  What  does  multiplying  by  h*[n]  achieve? 

6)  Implement  a  decision  rule  for  the  bits  encoded  in  the  QPSK  symbols  based  on  the  statistics 
{Z[n]}.  Estimate  by  simulation,  and  plot,  the  bit  error  probability  (log  scale)  as  a  function  of 
the  average  Eb/N0  (dB),  where  Eb/N0  ranges  from  0  to  30  dB.  Use  at  least  10,000  symbols  for 
your  estimate.  On  the  same  plot,  also  plot  the  analytical  bit  error  probability  as  a  function  of 
Eb/N0  when  there  is  no  fading.  You  should  see  a  marked  degradation  due  to  fading.  How  do 
you  think  the  error  probability  in  fading  varies  with  Eb/N07 

Relating  simulation  parameters  to  Eb/No:  The  average  symbol  energy  is  Es  =  U[|&[n]|2]U[|h[n]|2], 
and  Eb  =  lo^sM  ■  This  is  a  function  of  the  constellation  scaling  and  the  parameters  /32  and  p  in 

the  fading  simulator  (see  (b)).  You  can  therefore  fix  Es,  and  hence  Eb,  by  fixing  /?,  p  (e.g.,  as  in 
part  2),  and  fix  the  scaling  of  the  {6[n]}  (e.g.,  keep  the  constellation  points  as  ±1  ±  j).  Eb/N0 
can  now  be  varied  by  varying  the  variance  a2  of  the  noise  in  (6.93). 

Diversity 

The  severe  degradation  due  to  Rayleigh  fading  can  be  mitigated  by  using  diversity:  the  proba¬ 
bility  that  two  paths  are  simultaneously  in  a  deep  fade  is  less  likely  than  the  probability  that  a 
single  path  is  in  a  deep  fade.  Consider  a  receive  antenna  diversity  system,  where  the  received 
signals  y\  and  y 2  at  the  two  antennas  are  given  by 

yi[n]  =  hi[n]b[n]  +  wi[n]  ,  , 

y2[n]  =  h2[n]b[n]  +  w2[n] 

Thus,  you  get  two  looks  at  the  data  stream,  through  two  different  channels. 

Implement  the  two-fold  diversity  system  in  (6.95)  as  you  implemented  (6.93),  keeping  the  fol¬ 
lowing  in  mind: 

•  The  noises  w±  and  w2  are  independent  white  noise  sequences  with  variance  a2  =  per  di¬ 
mension  as  before. 

•  The  channels  hi  and  h2  are  generated  by  passing  independent  white  noise  streams  through  a 
first-order  recursive  filter.  In  relating  the  simulation  parameters  to  Eb/N0 ,  keep  in  mind  that  the 
average  symbol  energy  now  is  Es  =  U[|6[n]|2]U[|h1[n]|2  +  |h2[n]|2]. 

•  Use  the  following  maximal  ratio  combining  rule  to  obtain  the  decision  statistic 

Z2[n\  =  h\[n]yi[n}  +  h*2[n]y2[n] 
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The  decision  statistic  above  can  be  written  as 


Z2[n\  =  (\hi[n}\2  +  \h2[n\\2)b[n\  +w[n] 

where  w[n]  is  zero  mean  complex  Gaussian  with  variance  a2(\hi[n]\ 2  +  |h2[n]|2)  per  dimension. 
Thus,  the  instantaneous  SNR  is  given  by 


SNR[n] 


E 


imn}]2  +  \h2[n]mn]\' 


E[|w[n]|2] 


I  hi  [u]  | 2  +  |h2[n]|2E[|6[u]|2] 
2  a2 


7)  Plot  |  h±  [n]  | 2  +  |  h'2  [rt]  | "  in  dB  as  a  function  of  n,  with  0  dB  representing  the  average  value  as 
before.  You  should  find  that  the  fluctuations  around  the  average  are  less  than  in  part  3. 

8)  Implement  a  decision  rule  for  the  bits  encoded  in  the  QPSK  symbols  based  on  the  statistics 
{Z2[n]}.  Estimate  by  simulation,  and  plot  (on  the  same  plot  as  in  part  5),  the  bit  error  proba¬ 
bility  (log  scale)  as  a  function  of  the  average  E^/Nq  (dB),  where  Eb/N^  ranges  from  0  to  30  dB. 
Use  at  least  10,000  symbols  for  your  estimate.  You  should  see  an  improvement  compared  to  the 
situation  with  no  diversity. 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encoimtered. 

Bonus:  A  Glimpse  of  differential  modulation  and  demodulation 

Throughout  this  chapter,  we  have  assumed  that  a  noiseless  “template””  for  the  set  of  possible 
transmitted  signals  is  available  at  the  receiver.  In  the  present  context,  it  means  assuming  that 
estimates  for  the  time-varying  fading  channel  are  available.  But  what  is  these  estimates,  which 
we  used  to  generate  the  decision  statistics  earlier  in  this  lab,  are  not  available?  One  approach  that 
avoids  the  need  for  explicit  channel  estimation  is  based  on  exploiting  the  fact  that  the  channel 
does  not  change  much  from  symbol  to  symbol.  Let  us  illustrate  this  for  the  case  of  QPSK.  The 
model  is  exactly  as  in  (6.93)  or  (6.95),  but  the  channel  sequence(s)  is(are)  unknown  a  priori.  This 
necessitates  encoding  the  data  in  a  different  way.  Specifically,  let  d[n\  be  a  Gray  coded  QPSK 
information  sequence,  which  contains  information  about  the  bits  of  interest.  Instead  of  sending 
d[n]  directly,  we  generate  the  transmitted  sequence  b[n\  by  differential  encoding  as  follows: 

b[n]  =  d[n]b[n  —  1],  n  =  1,  2,  3, 4, .. 

(You  can  initialize  6(0)  as  any  element  of  the  constellation,  known  by  agreement  to  both  trans¬ 
mitter  and  receiver.  Or,  just  ignore  the  first  information  symbol  in  your  demodulation).  At 
the  receiver,  use  differential  demodulation  to  generate  the  decision  statistic  for  the  information 
symbol  d[n\  as  follows: 

Znc[n }  =  y[n\y*[n  —  1]  single  path 

Zfc[n\  =  y\ [n]y*[ri  —  1]  +  y2 [ri]y*2 [n  —  1]  dual  diversity 

where  the  superscript  indicates  noncoherent  demodulation,  i.e. ,  demodulation  that  does  not 
require  an  explicit  channel  estimate. 

Lab  report  for  bonus  assignment:  Estimate  by  simulation,  and  plot,  the  bit  error  probability  of 
Gray  coded  differentially  encoded  QPSK  as  a  function  of  Eb/N0  for  both  single  path  and  dual 
diversity.  Compare  with  the  curves  for  coherent  demodulation  that  you  have  obtained  earlier. 
How  much  (in  dB)  does  the  performance  degrade  by?  Document  your  results  as  in  the  earlier 
lab  reports. 


358 


6.  A  Irrelevance  of  component  orthogonal  to  signal  space 


Conditioning  on  Hi:  we  have  y(t)  =  Si(t )  +n(t).  The  component  of  the  received  signal  orthogonal 
to  the  signal  space  is  given  by 

n—  1  7i—l 

y±(t)  =  y(t)  -  ys(t)  =  y{t)  -  ^  Y[k]il>k{t)  =  s^t)  +  n(t)  -  ^  (Si[k]  +  N[k ])  ipkit) 

k= 0  k= 0 

But  the  signal  Si(t)  lies  in  the  signal  space,  so  that 


71—1 

Si(t)  ~  =  0 

k= 0 

That  is,  the  signal  contribution  to  yL  is  zero,  and 


n — 1 

y±(t)  =  n(t )  -  y^iV[A:]-0fe(t)  =  n^t) 
fc= o 

where  n-1  denotes  the  noise  projection  orthogonal  to  the  signal  space. 

We  now  show  that  n_L(t)  is  independent  of  the  signal  space  noise  vector  N.  Since  nL  and  N  are 
jointly  Gaussian,  it  suffices  to  show  that  they  are  uncorrelated.  For  any  t  and  k,  we  have 


cov(n±(t),  N[k})  =  E[n±(f)iV[/c]]  =  E  {n(t)  -  o  N\j] ^j{t)}N[k] 


=  e  [n(t)N[k\]  -  eu mmimit) 


(6.96) 


The  first  term  on  the  extreme  right-hand  side  can  be  simplified  as 


E[n(t)(n,  'ipk)}  =  E[n(t)  J  n(s)'ipk(s)ds]  =  j  'E[n(t)n(s)]'ipk(s)ds  =  j  a25(s—t)ripk(s)ds  = 

Plugging  (6.97)  into  (6.96),  and  noting  that  E[iV[j]V[fc]]  =  cr25jk,  we  obtain  that 

cov(n±(t),  N[j})  =  cr2i/>k(t)  -  v2^k(f)  =  0 


a2^k{t) 

(6.97) 


What  we  have  just  shown  is  that  the  component  of  the  received  signal  orthogonal  to  the  signal 
space  contains  the  noise  component  n ±  only,  and  thus  does  not  depend  on  which  signal  is  sent 
under  a  given  hypothesis.  Since  n-1  is  independent  of  N,  the  noise  vector  in  the  signal  space, 
knowing  nL  does  not  provide  any  information  about  N.  These  two  observations  imply  that  y1 
is  irrelevant  for  our  hypothesis  problem.  The  preceding  discussion  is  illustrated  in  Figure  6.9, 
and  enables  us  to  reduce  our  infinite-dimensional  problem  to  a  finite- dimensional  vector  model 
restricted  to  the  signal  space. 

Note  that  our  irrelevance  argument  depends  crucially  on  the  property  of  WGN  that  its  projec¬ 
tions  along  orthogonal  directions  are  independent.  Even  though  yL  does  not  contain  any  signal 
component  (since  these  by  definition  fall  into  the  signal  space),  if  n1-  and  N  exhibited  statis¬ 
tical  dependence,  one  could  hope  to  learn  something  about  N  from  n1- ,  and  thereby  improve 
performance  compared  to  a  system  in  which  yL  is  thrown  away.  However,  since  n1-  and  N  are 
independent  for  WGN,  we  can  restrict  attention  to  the  signal  space  for  our  hypothesis  testing 
problem. 
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Chapter  7 

Channel  Coding 


We  have  seen  in  Chapter  6  that,  for  signaling  over  an  AWGN  channel,  the  error  probability 
decays  exponentially  with  SNR,  with  the  rate  of  decay  determined  by  the  power  efficiency  of 
the  constellation.  For  example,  for  BPSK  or  Gray  coded  QPSK,  the  error  probability  is  given 

by  P  =  Q  bave  also  seen  in  Chapter  6  how  to  engineer  the  link  budget  so  as 

to  guarantee  a  certain  desired  performance.  So  far,  however,  we  have  only  considered  uncoded 
systems,  in  which  bits  to  be  sent  are  directly  mapped  to  symbols  sent  over  the  channel.  We 
now  indicate  how  it  is  possible  to  improve  performance  by  channel  coding,  which  corresponds  to 
inserting  redundancy  strategically  prior  to  transmission  over  the  channel. 

A  bit  of  historical  perspective  is  in  order.  As  mentioned  in  Chapter  1,  Shannon  showed  the 
optimality  of  separate  source  and  channel  coding  back  in  1948.  Shannon  also  provided  a  theory 
for  computing  the  limits  of  communication  performance  over  any  channel  (given  constraints 
such  as  power  and  bandwidth).  He  did  not  provide  a  constructive  means  of  attaining  these 
limits;  his  proofs  employed  randomized  constructions.  For  reasons  of  computational  complexity, 
it  was  assumed  that  such  strategies  could  never  be  practical.  Hence,  for  decades  after  Shannon’s 
1948  publication,  researchers  focused  on  algebraic  constructions  (for  which  decoding  algorithms 
of  reasonable  complexity  could  be  devised)  to  create  powerful  channel  codes,  but  never  quite 
succeeded  in  attaining  Shannon’s  benchmarks.  This  changed  with  the  invention  of  turbo  codes 
by  Berrou  et  al  in  1993:  their  conference  paper  laid  out  a  simple  coding  strategy  that  got  to  within 
a  dB  of  Shannon  capacity.  They  took  codes  which  were  easy  to  encode,  and  used  scramblers  to 
make  them  random-like.  Maximum  likelihood  decoding  for  such  codes  is  too  computationally 
complex,  but  Berrou  et  al  showed  that  suboptimal  iterative  decoding  methods  provide  excellent 
performance  with  reasonable  complexity.  It  was  then  realized  that  a  different  class  of  random¬ 
like  codes,  called  low  density  parity  check  (LDPC)  codes,  along  with  an  appropriate  iterative 
decoding  procedure,  had  actually  been  invented  by  Gallager  in  the  1960s.  Since  then,  there  has 
been  a  massive  effort  to  devise  and  implement  a  wide  variety  of  “turbo- like”  codes  (i.e. ,  random¬ 
like  codes  amenable  to  iterative  decoding),  with  the  result  that  we  can  now  approach  Shannon’s 
performance  benchmarks  over  almost  any  channel. 

In  this  chapter,  we  provide  a  glimpse  of  how  Shannon’s  performance  benchmarks  are  computed, 
how  channel  codes  are  constructed,  and  how  iterative  decoding  works.  A  systematic  and  com¬ 
prehensive  treatment  of  information  theory  and  channel  coding  would  take  up  entire  textbooks 
in  itself,  hence  our  goal  is  to  provide  just  enough  exposure  to  some  of  the  key  ideas  to  encourage 
further  exploration. 

Chapter  Plan:  In  Section  7.1,  we  discuss  two  extreme  examples,  uncoded  transmission  and 
repetition  coding,  in  order  to  motivate  the  need  for  more  sophisticated  channel  coding  strategies. 
A  generic  model  for  channel  coding  is  discussed  in  Section  7.2.  Section  7.3  introduces  Shannon’s 
information-theoretic  framework,  which  provides  fundamental  performance  limits  for  any  chan- 
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nel  coding  scheme,  and  discusses  its  practical  implications.  Linear  codes,  which  are  the  most 
prevalent  class  of  codes  used  in  practice,  are  introduced  in  Section  7.4.  Finally,  we  discuss  belief 
propagation  decoding,  which  has  been  crucial  for  approaching  Shannon  performance  limits  in 
practice,  in  Section  7.5. 

Software:  Concepts  in  belief  propagation  decoding  are  reinforced  by  Software  Lab  7.1. 


7.1  Motivation 


Figure  7.1:  Block  error  probability  versus  bit  error  probability  for  uncoded  transmission  (block 
size  is  1500  bytes). 


Uncoded  transmission:  First,  let  us  consider  what  happens  without  channel  coding.  Suppose 
that  we  are  sending  a  data  block  of  1500  bytes  (i.e.,  n  =  12000  bits,  since  1  byte  comprises  8 
bits)  over  a  binary  symmetric  channel  (see  Chapter  5)  with  bit  error  probability  p ,  where  errors 
occur  independently  for  each  bit.  Such  a  BSC  could  be  induced,  for  example,  by  making  hard 


decisions  for  Gray  coded  QPSK  over  an  AWGN  channel;  in  this  case,  we  have  p 


Let  us  now  define  block  error  as  the  event  that  one  or  more  bits  in  the  block  are  in  error.  The 
probability  that  all  of  the  bits  get  through  correctly  is  given  by  (1  —  p)n,  so  that  the  probability 
of  block  error  is  given  by 


PB  =  l-(l-p)n 


Figure  7.1  plots  the  probability  of  block  error  versus  the  probability  of  bit  error  on  a  log-log 
scale.  Despite  its  simplicity,  this  computation  leads  to  some  useful  observations. 

(a)  For  p  >  10  4,  the  probability  of  block  error  is  essentially  one.  This  is  because  the  expected 
number  of  errors  in  the  block  is  given  by  np,  and  when  this  is  of  the  order  of  one,  the  probability 
of  making  at  least  one  error  is  very  close  to  one,  because  of  the  law  of  large  numbers.  Using  this 
reasoning,  we  see  that  it  becomes  harder  and  harder  to  guarantee  reliability  as  the  block  size 
increases,  since  p  must  scale  as  1/n.  Clearly,  this  is  not  a  sustainable  approach.  For  example, 
even  the  corruption  of  a  single  bit  in  a  large  computer  hie  can  cause  chaos,  so  we  must  find  more 
sophisticated  means  of  protecting  the  data  than  just  trying  to  drive  the  raw  bit  error  probability 
to  zero. 

(b)  It  is  often  possible  to  efficiently  detect  block  errors  with  very  high  probability.  In  practice, 
this  might  be  achieved  by  using  a  cyclic  redundancy  check  (CRC)  code,  but  we  do  not  discuss  the 
specific  error  detection  mechanism  here.  If  a  block  error  is  detected,  then  the  receiver  may  ask 
the  transmitter  to  retransmit  the  packet,  if  such  retransmissions  are  supported  by  the  underlying 
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protocols.  The  link  efficiency  in  this  case  becomes  1  —  Pb-  Thus,  if  we  can  do  retransmissions, 
uncoded  transmission  may  actually  not  be  a  terrible  idea.  In  our  example,  the  link  is  90%  efficient 
( Pb  =  10-1)  for  bit  error  probability  p  around  10-6  —  1CT5,  and  99%  efficient  (Pb  =  10”2)  for  p 
around  10-7  —  10~6. 

(c)  For  Gray  coded  QPSK,  p  =  Q  ,  so  that  p  =  10-6  requires  Eb/N0  of  about  10.55  dB. 

This  is  exactly  the  scenario  in  the  link  budget  example  modeling  a  5  GHz  WLAN  link  in  Chapter 
6.  We  see,  therefore,  that  uncoded  transmission,  along  with  retransmissions,  is  a  viable  option 
in  that  setting. 


Figure  7.2:  Error  probability  decays  rapidly  as  a  function  of  blocklength  for  a  repetition  code. 


Repetition  coding:  Next,  let  us  consider  the  other  extreme,  in  which  we  send  n  copies  of  a  single 
bit  over  a  BSC  with  error  probability  p.  That  is,  we  either  send  a  string  of  n  zeros,  or  a  string  of 
n  ones.  The  channel  may  flip  some  of  these  bits.  Since  the  errors  are  independent,  the  number  of 
errors  is  a  binomial  random  variable,  Bin(n,p).  For  p  <  |,  the  average  number  of  bits  in  error, 
np  <  n/2,  hence  a  natural  decoding  rule  is  to  employ  majority  logic:  decide  on  0  if  the  majority 
of  received  bits  is  zero,  and  on  1  otherwise.  Taking  n  to  be  odd  for  simplicity  (otherwise  we 
need  to  specify  a  tiebreaker  when  there  are  an  equal  number  of  zeros  and  ones),  a  block  error 
occurs  if  the  number  of  errors  is  [n/2]  or  more.  Using  the  binomial  PMF,  we  have  the  following 
expression  for  the  block  error  probability: 

pB=  it  (”  )pm(i-pr 

m=  fra/2]  ^  ' 

Figure  7.2  plots  the  probability  of  block  error  versus  n  for  p  =  1CU1  and  p  =  1CT2.  Clearly, 
Pb  — >  0  as  n  — >  oo,  so  we  are  doing  well  in  terms  of  reliability.  To  see  why,  let  us  invoke  the  LLN 
again:  the  average  number  of  errors  is  np  <  [n/2],  so  that,  as  n  — >  oo,  the  number  of  errors  is 
smaller  than  [n/2]  with  probability  one.  However,  we  are  only  sending  one  bit  of  information  for 
every  n  bits  that  we  send  over  the  channel,  corresponding  to  a  code  rate  of  1/n  (one  information 
bit  for  every  n  transmitted  bits),  which  tends  to  zero  as  n  — »  oo. 

We  have  invoked  the  LLN  to  explain  the  performance  of  both  uncoded  transmission  and  repetition 
coding  for  large  n,  but  neither  of  these  approaches  provides  reliable  performance  at  nonzero 
coding  rates.  As  n  — »  oo,  the  block  error  rate  Pb  1  for  uncoded  transmission,  while  the 
code  rate  tends  to  zero  for  the  repetition  code.  However,  it  is  possible  to  design  channel  coding 
schemes  between  these  two  extremes  which  provide  arbitrarily  reliable  communication  ( Pb  — >  0 
as  n  — y  oo)  at  non-vanishing  code  rates.  The  existence  of  such  codes  is  guaranteed  by  LLN-style 
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arguments.  For  example,  for  a  BSC  with  crossover  probability  p,  as  n  gets  large,  the  number 
of  errors  clusters  around  np.  Thus,  the  basic  intuition  is  that,  if  we  are  able  to  insert  enough 
redundancy  to  correct  a  number  of  errors  of  the  order  of  np,  then  we  should  be  able  to  approach 
zero  block  error  probability.  Giving  precise  form  to  such  existence  arguments  is  the  realm  of 
information  theory,  which  can  be  used  to  establish  fundamental  performance  limits  for  almost 
any  reasonable  channel  model,  while  coding  theory  concerns  itself  with  constructing  practical 
coding  schemes  that  approach  these  performance  limits.  A  detailed  exposition  of  information 
and  coding  theory  is  well  beyond  our  scope,  but  our  goal  here  is  to  provide  just  enough  exposure 
to  stimulate  and  guide  further  exploration. 


7.2  Model  for  Channel  Coding 

We  introduce  some  basic  terminology  related  to  channel  coding,  and  discuss  where  it  fits  within 
a  communication  link. 

Binary  code:  An  (n,  k )  binary  code  maps  k  information  bits  to  n  transmitted  bits,  where  n  >  k. 
Each  of  the  k  information  bits  can  take  any  value  in  {0, 1},  hence  the  code  C  is  a  set  of  2k 
codewords,  each  a  binary  vector  of  length  n.  The  code  rate  is  defined  as  Rc  —  k/n. 


Estimated 


Figure  7.3:  High-level  model  for  coded  system. 


Figure  7.3  provides  a  high-level  view  of  how  a  binary  channel  code  can  be  used  over  a  com¬ 
munication  link.  The  encoder  maps  the  A;- bit  information  word  u  to  an  n-bit  codeword  x.  As 
discussed  shortly,  the  “channel”  shown  in  the  figure  is  an  abstraction  that  includes  operations 
at  the  transmitter  and  the  receiver,  in  addition  to  the  physical  channel.  The  output  y  of  the 
channel  is  a  length  n  vector  of  hard  decisions  (bits)  or  soft  decisions  (real  numbers)  on  the  coded 
bits.  These  are  then  used  by  the  decoder  to  provide  an  estimate  u  of  the  information  bits.  We 
declare  a  block  error  if  u  ^  u. 


bit  decisions  (hard  or  soft) 


"Channel"  (as  seen  by  binary  channel  code) 


Figure  7.4:  An  example  of  bit  interleaved  coded  modulation. 


364 


Figure  7.4  provides  a  specific  example  illustrating  how  the  preceding  abstraction  connects  to 
the  transceiver  design  framework  developed  in  earlier  chapters.  It  shows  a  binary  code  used 
for  signaling  over  an  AWGN  channel  using  Gray  coded  16QAM.  We  see  that  n  coded  bits  are 
mapped  to  n/4  complex-valued  symbols  at  the  transmitter.  Since  channel  codes  are  typically 
designed  for  random  errors,  we  have  inserted  an  interleaver  between  the  channel  encoder  and 
the  modulator  in  order  to  disperse  potential  correlations  in  errors  among  bits.  The  modulator 
could  employ  linear  modulation  as  described  in  Chapter  4,  with  demodulation  as  in  Chapter 
6  for  an  ideal  AWGN  channel,  or  more  sophisticated  equalization  strategies  for  handling  the 
intersymbol  interference  due  to  channel  dispersion  (see  Chapter  8).  An  alternative  frequency 
domain  modulation  strategy,  termed  Orthogonal  Frequency  Division  Multiplexing  (OFDM),  for 
handling  channel  dispersion  are  also  discussed  in  Chapter  8.  However,  for  our  present  purpose 
of  discussing  channel  coding,  we  abstract  all  of  these  details  away.  Indeed,  as  shown  in  Figure 
7.4,  the  “channel”  from  Figure  7.3  includes  all  of  these  operations,  with  the  final  output  being 
the  hard  or  soft  decisions  supplied  to  the  decoder.  Problem  7.4  explores  the  nature  of  this 
equivalent  channel  for  some  example  constellations.  Often,  even  if  the  physical  channel  has 
memory,  the  interleaving  and  deinterleaving  operations  allow  us  to  model  the  equivalent  channel 
as  memory  less:  the  output  yi  depends  only  on  coded  bit  and  the  channel  is  completely 
characterized  by  the  conditional  density  p(yi\xi).  For  example,  for  hard  decisions,  we  may  model 
the  equivalent  channel  as  a  binary  symmetric  channel  with  error  probability  p.  For  soft  decisions, 
yi  may  be  a  real  number,  or  may  comprise  several  bits,  hence  the  channel  model  would  be  a  little 
more  complicated. 

The  preceding  approach,  which  neatly  separates  out  the  binary  channel  code  from  the  signal 
processing  related  to  transmitting  and  receiving  over  a  physical  channel,  is  termed  bit  interleaved 
coded  modulation  (BICM).  If  we  use  a  binary  code  of  rate  Rc  =  k/n  and  a  symbol  alphabet  of 
size  M,  then  the  overall  rate  of  communication  over  the  channel  is  given  by  Rc  log2  M  bits  per 
symbol.  From  Chapter  4,  we  know  that,  using  ideal  Nyquist  signaling,  we  can  signal  at  rate  W 
complex- valued  symbols/sec  over  a  bandlimited  passband  channel  of  bandwidth  W.  Thus,  the 
rate  of  communication  in  bits  per  second  (bps)  is  given  by  Rb  =  RCW  log2  M.  The  bandwidth 
efficiency,  or  spectral  efficiency,  can  now  be  defined  as 

r  =  ^  =  Rc  log2  M  =  ^  log2  M  (7.1) 

in  bps/Hz,  or  bits/symbol  (for  ideal  Nyquist  signaling).  Comparing  with  Chapter  4  (where  we 
termed  this  quantity  rj jy),  what  has  changed  is  that  we  must  now  account  for  the  rate  of  the 
binary  code  that  we  have  wrapped  around  our  communication  link. 

We  also  need  to  revisit  our  SNR  concepts  and  carefully  keep  track  of  information  bits,  coded 
bits,  and  modulated  symbols,  when  computing  signal  power  or  energy.  The  quantity  Eb  refers  to 
energy  per  information  bit.  When  we  encode  these  bits  using  a  binary  code  of  rate  RCl  the  energy 
per  coded  bit  is  Ec  =  RcEb  (information  bits  per  coded  bit,  times  energy  per  information  bit). 
When  we  then  put  the  coded  bits  through  a  modulator  that  outputs  M- ary  symbols  (log2  M 
coded  bits  per  symbol),  we  obtain  that  the  energy  per  modulated  symbol  is  given  by 

Es  =  Ec  log2  M  =  RcEb  log2  M 


In  short,  we  have 


Es  =  rEb 


(7.2) 


which  makes  sense:  energy  per  symbol  equals  the  number  of  information  bits  per  symbol,  times 
the  energy  per  information  bit.  While  we  have  established  (7.2)  for  BICM,  it  holds  generally, 
since  it  is  just  a  matter  of  energy  bookkeeping. 

BICM  is  a  practical  approach  which  applies  to  any  physical  communication  channel,  and  the 
significant  advances  in  channel  coding  over  the  past  two  decades  ensure  that  there  is  little  loss  in 
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optimality  due  to  this  decoupling  of  coding  and  modulation.  In  the  preceding  example,  we  have 
used  it  in  conjunction  with  Nyquist  sampling,  which  transforms  the  continuous  time  channel 
into  a  discrete  time  channel  carrying  complex-valued  symbols.  However,  we  can  also  view  the 
Nyquist  sampled  channel  in  greater  generality,  in  which  the  inputs  to  the  effective  channel  are 
complex-valued  symbols,  and  the  outputs  are  the  noisy  received  samples  at  the  output  of  the 
equalizer/demodulator.  A  code  of  rate  R  bits/channel  use  over  this  channel  is  simply  a  collection 
of  2nr  discrete  time  complex-valued  vectors  of  length  N,  where  N  is  the  number  of  symbols  sent 
over  the  channel.  In  our  BICM  example  with  16  QAM,  we  have  R  =  Rc  log2  M  =  4 Rc  and 
N  =  n/4,  but  this  framework  also  accommodates  approaches  which  tie  coding  and  modulation 
more  closely  together.  The  tools  of  information  theory  can  be  used  to  provide  fundamental 
performance  limits  for  any  such  coded  modulation  strategy.  We  provide  a  glimpse  of  such  results 
in  the  next  section. 


7.3  Shannon’s  Promise 

Shannon  established  the  field  of  information  theory  in  the  1940s.  Among  its  many  consequences 
is  the  channel  coding  theorem,  which  states  that,  if  we  allow  code  block  lengths  to  get  large 
enough,  then  there  is  a  well-defined  quantity  called  channel  capacity,  which  determines  the 
maximum  rate  at  which  reliable  communication  can  take  place.  A  class  of  channel  models  of 
fundamental  importance  is  the  following. 

Discrete  memoryless  channel  (DMC):  Inputs  are  fed  to  the  channel  in  discrete  time.  If  x  is 
the  channel  input  at  a  given  time,  then  the  output  y  at  that  time  has  conditional  density  p(y\x). 
For  multiple  channel  uses,  the  outputs  are  conditionally  independent  given  the  inputs,  as  follows: 

p(yu -,yn\xi, -,xn)  =  p(yi\xi)...p(yn\xn) 

The  inputs  may  be  constrained  in  some  manner  (e.g.,  to  take  values  from  a  finite  alphabet,  or  to 
be  limited  in  average  or  peak  power).  A  channel  code  of  length  n  and  rate  R  bits  per  channel  use 
contains  2nR  codewords.  That  is,  we  employ  M-ary  signaling  with  M  =  2nR ,  where  each  signal, 
or  codeword,  is  a  vector  of  length  n,  with  the  jth  codeword  denoted  by  =  (Xp\  ...,A^)T, 
j  =  l,...,2nR. 

Shannon’s  channel  coding  theorem  gives  us  a  compact  characterization  of  the  channel  capacity 
C  (in  bits  per  channel  use)  for  a  DMC.  It  states  that,  for  any  code  rate  below  capacity  (R  <  C), 
and  for  large  enough  block  length  n,  there  exist  codes  and  decoding  strategies  such  that  the 
block  error  probability  can  be  made  arbitrarily  small.  The  converse  of  this  result  also  holds:  for 
code  rates  above  capacity  (R  >  C),  the  block  error  probability  is  bounded  away  from  zero  for 
any  coding  strategy.  The  fundamental  intuition  is  that,  for  large  block  lengths,  events  that  cause 
errors  cluster  around  some  well-defined  patterns  with  very  high  probability  (because  of  the  law 
of  large  numbers),  hence  it  is  possible  to  devise  channel  codes  that  can  correct  these  patterns  as 
long  as  we  are  not  trying  to  fit  in  too  many  codewords. 

Giving  the  expression  for  the  Shannon  capacity  of  a  general  DMC  is  beyond  our  scope,  but 
we  do  provide  intuitive  derivations  of  the  channel  capacity  for  the  two  DMC  models  of  greatest 
importance  to  us:  the  discrete  time  AWGN  channel  and  the  BSC.  We  then  discuss,  via  numerical 
examples,  how  these  capacity  computations  can  be  used  to  establish  design  guidelines. 

Discrete  time  AWGN  channel:  Let  us  consider  the  following  real- valued  discrete  time  AWGN 
channel  model,  where  we  send  a  codeword  consisting  of  a  sequence  of  real  numbers  {Xt,  i  = 
1,  ...,n},  and  obtain  the  noisy  outputs 

Yi  =  Xt  +  Ni,  i  =  l,...,n  (7.3) 
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where  iVj  ~  N(0,  N )  are  i.i.d.  Gaussian  noise  samples.  We  impose  a  power  constraint  E[X2]  <  S. 
This  model  is  called  the  discrete  time  AWGN  channel.  For  Nyquist  signaling  over  a  continuous¬ 
time  bandlimited  AWGN  channel  with  bandwidth  W,  we  can  signal  at  the  rate  of  W  complex¬ 
valued  symbols  per  second,  or  2 W  real-valued  symbols/second.  This  can  be  interpreted  as  getting 
to  use  the  discrete  time  AWGN  channel  (7.3)  2 W  times  per  second.  Thus,  once  we  figure  out 
the  capacity  for  the  discrete  time  AWGN  channel  in  bits  per  channel  use,  we  will  be  able  to 
specify  the  maximum  rate  at  which  information  can  be  transmitted  reliably  over  a  bandlimited 
continuous  time  channel. 

A  channel  code  over  the  discrete  time  channel  (7.3)  of  rate  R  bits/channel  contains  2nR  codewords, 
where  the  jth  codeword  X*A  =  (x\3\  ...,  Xn'l)T  satisfies  the  average  power  constraint  if 


>  '  \x[j)\2  <  nS 

i—  1 


If  the  j th  codeword  is  sent,  the  received  vector  is 

Y  =  X +  N  ,  codeword  j  transmitted  (7. 4) 

where  N  =  (Ni, ...,  Nn)T  is  the  noise  vector.  The  expected  energy  of  the  noise  vector  equals 


E[||N|fl=£E[JV?]=nJV  (7.5) 

i=  1 


The  expected  energy  of  the  received  vector  equals 

E[||Y||2]  =  E”„ E[|VP  +  JVil2]  =  (iNT  +  E[/V2]  +  2E(x“ATi]) 

=  ||X^||2  +  nY  <  n(S  +  N) 


(7.6) 


(The  cross  term  involving  signal  and  noise  drops  away,  since  they  are  independent  and  the  noise 
is  zero  mean.) 

We  now  provide  a  heuristic  argument  as  to  how  reliable  performance  can  be  achieved  by  letting 
the  code  block  length  n  get  large.  Invoking  the  law  of  large  numbers,  random  quantities  cluster 
around  their  averages  with  high  probability,  so  that  the  received  vector  Y  lies  inside  an  n- 
dimensional  sphere  of  radius  y/ n(S  +  N),  and  the  noise  vector  N  lies  inside  an  n- dimensional 
sphere  of  radius  y/nJV.  Consider  now  a  “decoding  sphere”  around  each  codeword  with  radius 
just  a  little  larger  than  \/nN.  Then  we  make  correct  decisions  with  high  probability:  if  we 
send  codeword  j,  the  noise  vector  N  is  highly  unlikely  to  push  us  outside  the  decoding  sphere 
centered  around  X^b  The  question  then  is:  what  is  the  largest  number  of  decoding  spheres  of 
radius  VnN  that  we  can  pack  inside  the  n-dimensional  sphere  of  radius  \/n(S  +  N)  in  which 
the  received  vector  Y  lives?  This  sphere  packing  argument,  depicted  in  Figure  7.5,  provides  an 
estimate  of  the  largest  number  of  codewords  2nR  that  we  can  accommodate  while  guaranteeing 
reliable  communication. 

We  now  invoke  the  result  that  the  volume  of  an  n-dimensional  sphere  of  radius  r  is  Knrn,  where 
Kn  is  a  constant  depending  on  n  whose  explicit  form  we  do  not  need.  We  can  now  estimate  the 
maximum  achievable  rate  R  as  follows: 


2nR  < 


Kn  (\/ n(S  +  N) 


Kn  VnN 


n/2 
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Figure  7.5:  Sphere  packing  argument  for  characterizing  rate  of  reliable  communication. 


from  which  we  obtain 

While  we  have  used  heuristic  arguments  to  arrive  at  this  result,  it  can  actually  be  rigorously 
demonstrated  that  the  right-hand  side  of  (7.7)  is  indeed  the  maximum  possible  rate  of  reliable 
communication  over  the  discrete-time  AWGN  channel. 

Capacity  of  the  discrete  time  AWGN  channel:  We  can  now  state  that  the  capacity  of  the 
discrete  time  AWGN  channel  (7.3)  is  given  by 

Cd-AWGN  =  ^  log2  (l  +  ~~  J  bits/ channel  use  (7.7) 


Continuous  time  bandlimited  AWGN  channel:  We  now  use  this  result  to  compute  the 
maximum  spectral  efficiency  attainable  over  a  continuous  time  bandlimited  AWGN  channel. 
The  complex  baseband  channel  corresponding  to  a  passband  channel  of  physical  bandwidth  W 
spans  [— W/2,  W / 2]  (taking  the  reference  frequency  at  the  center  of  the  passband).  Thus,  Nyquist 
signaling  over  this  channel  corresponds  to  W  complex-valued  symbols  per  second,  or  2 W  uses  per 
second  of  the  real- valued  channel  (7.3).  Since  each  complex- valued  symbol  corresponds  to  two 
uses  of  the  real  discrete  time  AWGN  channel,  the  capacity  of  the  bandlimited  channel  is  given  by 
2WCd~AWGN  bits  per  second.  We  still  need  to  specify  For  each  complex-valued  sample,  the 
energy  per  symbol  Es  =  rEb  (bits/symbol,  times  energy  per  bit,  gives  energy  per  comp  lex- valued 
symbol).  The  noise  variance  per  real  dimension  is  a2  =  hence  the  noise  variance  seen  by  a 
complex  symbol  is  2a2  =  N0.  We  obtain 


S_  =  E± 
N  N0 


(7.8) 


Putting  these  observations  together,  we  can  now  state  the  following  formula  for  the  capacity  of 
the  bandlimited  AWGN  channel. 
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Capacity  of  the  bandlimited  AWGN  channel: 


C 


BL 


W, 


Es 

Nn 


=  W  log2  (  1  +  -E-  )  bits  per  second 


(7.9) 


It  can  be  checked  that  we  get  exactly  the  same  result  for  a  physical  (real-valued)  baseband 
channel  of  physical  bandwidth  W.  Such  a  channel  spans  [-W,  W],  but  the  transmitted  signal  is 
constrained  to  be  real-valued.  Signals  over  such  a  channel  can  therefore  be  represented  by  2 W 
real-valued  samples  per  second,  which  is  the  same  as  for  a  passband  channel  of  bandwidth  W. 

For  a  system  communicating  reliably  at  a  bit  rate  of  Rb  bps  over  such  a  bandlimited  channel, 
we  must  have  Rb  <  Cbl-  Using  (7.9),  we  see  that  the  spectral  efficiency  r  =  in  bps/Hz  of  the 
system  must  therefore  satisfy 


r  <  log2  1  + 


Es 

Nn 


=  log2  1  +  r 


Eb 

Nn 


bps/Hz  or  bits/complex  symbol 


(7.10) 


where  we  have  used  (7.2). 

The  preceding  defines  the  regime  where  reliable  communication  is  possible.  We  can  rewrite  (7.10) 
to  obtain  the  fundamental  limits  on  the  power-bandwidth  tradeoffs  achievable  over  the  AWGN 
channel,  as  follows. 


Figure  7.6:  Power-bandwidth  tradeoffs  over  the  AWGN  channel. 


Fundamental  power-bandwidth  tradeoff  for  the  bandlimited  AWGN  channel: 


Eb  2r  -1 
N0  >  r 


regime  where  reliable  communication  is  possible 


(7.11) 


The  quantity  r  is  the  bandwidth  efficiency  of  a  bandlimited  AWGN  channel.  This  fundamental 
power-bandwidth  tradeoff  is  depicted  in  Figure  7.6,  which  plots  the  minimum  required  Eb/N0 
(dB)  versus  the  bandwidth  efficiency.  Note  that  we  cannot  make  the  Eb/N0  required  for  reliable 
communication  arbitrarily  small  even  as  the  bandwidth  efficiency  goes  to  zero.  We  leave  it 
as  an  exercise  (Problem  7.6)  to  show  that  the  minimum  possible  value  of  Eb/No  for  reliable 
communication  (corresponding  to  r  — y  0)  over  the  AWGN  channel  is  —1.6  dB.  Another  point 
worth  emphasizing  is  that  these  power-bandwidth  tradeoffs  assume  powerful  channel  coding,  and 
are  different  from  those  discussed  for  uncoded  systems  in  Chapter  6:  recall  that  the  bandwidth 
efficiency  for  M-ary  uncoded  linear  modulation  was  equal  to  log2  M,  and  that  the  power  efficiency 
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d2  . 

was  defined  as  -g^.  Numerical  examples  showing  how  channel  coding  fundamentally  changes  the 
achievable  power-bandwidth  tradeoffs  are  explored  in  more  detail  in  Section  7.3.1.  We  provide 
here  a  quick  example  that  illustrates  how  (7.11)  relates  to  real  world  scenarios. 

Example  7.3.1  (evaluating  system  feasibility  using  Shannon  limits)  A  company  claims 
to  have  developed  a  wireless  modem  with  a  receiver  sensitivity  of  -82  dBm  and  a  noise  figure  of 
6  dB,  operating  at  a  rate  of  100  Mbps  over  a  bandwidth  of  20  MHz.  Do  you  believe  their  claim? 
Shannon  limit  calculations:  Modeling  the  channel  as  an  ideal  bandlimited  AWGN  channel,  the 
proposed  modem  must  satisfy  (7.11).  Assuming  no  excess  bandwidth,  r  =  1.^  =  5  bps/Hz 

or  bits/symbol.  From  (7.11),  we  know  that  we  must  have  ( Eb/N0)required  >  =  6.2,  or  7.9 

dB.  The  noise  PSD  N0  is  given  by  —174  +  6  =  —168  dBm  over  1  Hz.  The  energy  per  bit  equals 
the  received  power  divided  by  the  bit  rate,  so  that  the  actual  Eb/N0  for  the  advertised  receiver 
sensitivity  (i.e.,  the  receive  power  at  which  the  modem  can  operate)  is  given  by  ( Eb/N0)actuai  = 
— 82  — 101og10  108  +  168  =  6  dB.  This  is  1.9  dB  short  of  the  Shannon  limit,  hence  our  first  instinct 
is  not  to  believe  them. 

Tweaking  the  channel  model:  What  if  the  channel  was  not  a  single  AWGN  channel,  but  two 
AWGN  channels  in  parallel?  As  we  shall  see  when  we  discuss  multiple  antenna  systems  in 
Chapter  8,  it  is  possible  to  use  multiple  antennas  at  the  transmitter  and  receiver  to  obtain  spatial 
degrees  of  freedom  in  addition  to  those  in  time  and  frequency.  If  there  are  indeed  two  spatial 
channels  that  are  created  using  multiple  antennas  and  we  can  model  each  of  them  as  AWGN, 
then  the  spectral  efficiency  per  channel  is  5/2  =  2.5  bps/Hz,  and  ( Eb/N0)required  >  2  2~1  =  1.86, 
or  2.7  dB.  Since  the  actual  Eb/No  is  6  dB,  the  system  is  operating  more  than  3  dB  away  from 
the  Shannon  limit.  Since  we  do  have  practical  channel  codes  that  get  to  within  a  dB  or  less  of 
Shannon  capacity,  the  claim  now  becomes  believable. 


Figure  7.7:  Binary  symmetric  channel  with  crossover  p. 


Binary  symmetric  channel:  We  now  turn  our  attention  to  the  BSC  with  crossover  probability 
p  shown  in  Figure  7.7,  which  might,  for  example,  be  induced  by  hard  decisions  on  an  AWGN 
channel.  Note  that  we  are  only  interested  in  0  <  p  <  If  p  >  then  we  can  switch  zeros  and 
ones  at  the  output  of  the  channel  to  get  back  to  a  BSC  with  crossover  probability  p  =  1  —  p  < 
The  BSC  can  also  be  written  as  an  additive  noise  channel,  analogous  to  the  discrete  time  AWGN 
channel  (7.3): 

Yi  =  Xi®Ni,  i  —  1,  ...,n  (7.12) 

where  the  exclusive  or  (XOR)  symbol  ©  corresponds  to  addition  modulo  2,  which  follows  the 
rules: 

1©1  =  0©0  =  0  (7  1© 

1©0=0©1=1  1  J 

Thus,  we  can  flip  a  bit  by  adding  (modulo  2)  a  1  to  it.  The  probability  of  a  bit  flip  is  p.  Thus, 
the  noise  variables  A/  are  i.i.d.  Bernoulli  random  variables  with  P[W  =  1]  =  p  =  1  —  P[W  =  0]. 
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Just  as  with  the  AWGN  channel,  we  now  develop  a  sphere  packing  argument  to  provide  an 
intuitive  derivation  of  the  BSC  channel  capacity.  Of  course,  our  concept  of  distance  must  be 
different  from  the  Euclidean  distance  considered  for  the  AWGN  channel.  Define  the  Hamming 
distance  between  two  binary  vectors  of  equal  length  to  be  the  number  of  places  in  which  they 
differ.  For  a  codeword  of  length  n,  the  average  number  of  errors  equals  np.  Assuming  that 
the  number  of  errors  clusters  around  this  average  for  large  n,  define  a  decoding  sphere  around  a 
codeword  as  all  sequences  which  are  at  Hamming  distance  of  np  or  less  from  it.  The  number  of 
such  sequences  is  called  the  volume  of  the  decoding  sphere.  By  virtue  of  (7.12)  and  (7.13),  we 
see  that  this  volume  is  exactly  equal  to  the  number  of  noise  vectors  N  =  (Ar1,  ...,Nn)T  with  np 
or  fewer  ones  (the  number  of  ones  in  a  sequence  is  called  its  weight).  The  number  of  n-length 

vectors  with  weight  m  equals  (  V  hence  the  number  of  vectors  with  weight  at  most  np  is 

given  by 

np  /  \ 

* = e  : 

m= 0  ^  ' 

We  state  without  proof  the  following  asymptotic  approximation  for  Vn  for  large  n: 

Vn&2nHB(p)  ,  0  <V<\  (7-15) 

where  Hb(- )  is  the  binary  entropy  function,  defined  by 

HB(p)  =  — plog2p—  (1  —  p)log2(l  -p)  ,  0  <p<  1  (7.16) 

We  plot  the  binary  entropy  function  in  Figure  7.8(a).  Note  the  symmetry  around  p  —  This  is 
because,  as  mentioned  earlier,  we  can  map  p>ytol— p<|by  switching  the  roles  of  0  and  1 
at  the  output. 


(a)  Binary  entropy  function 


(b)  BSC  capacity 


Figure  7.8:  The  binary  entropy  function  Hs(p)  and  the  capacity  of  a  BSC  with  crossover  prob¬ 
ability  p,  given  by  1  —  Hb(p)- 

For  a  length  n  code  of  rate  R  bits/channel  use,  the  number  of  codewords  equals  2nR.  The  total 
number  of  binary  sequences  of  length  n,  or  the  entire  volume  of  the  space  we  are  working  in,  is 
2n.  Thus,  if  we  wish  to  put  a  decoding  sphere  of  volume  Vn  around  each  codeword,  the  maximum 
number  of  codewords  we  can  fit  must  satisfy 

on  on 

2nR  <  _  ~  _  —  2 n(l—HB(p) 

~  Vn  ~  2  nHB(.p) 
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which  gives 


R  <  1  -  HB(p) 

It  can  be  rigorously  demonstrated  that  the  right-hand  side  actually  equals  the  capacity  of  the 
BSC.  We  therefore  state  this  result  formally. 

Capacity  of  BSC:  The  capacity  of  a  BSC  with  crossover  probability  p  is  given  by 

Cbsc(p)  =  1  —  Hb(p )  bits/channel  use  (7-17) 

The  capacity  is  plotted  in  Figure  7.8(b).  We  note  the  following  points. 

•  For  p  =  \,  the  channel  is  useless  (its  output  does  not  depend  on  the  input)  and  has  capacity 
zero. 

•  For  p  >  we  switch  zeros  and  ones  at  the  output  to  obtain  an  effective  BSC  with  crossover 
probability  1  -  p,  hence  CBsc(p )  =  CBSC{  1  ~p)- 

•  For  p  —  0,1,  the  channel  is  perfect,  and  the  capacity  attains  its  maximum  value  of  1  bit/channel 
use. 


7.3.1  Design  Implications  of  Shannon  Limits 

Since  the  invention  of  turbo  codes  in  1993  and  the  subsequent  rediscovery  of  LDPC  codes,  we 
now  know  how  to  construct  random-looking  codes  that  can  be  efficiently  decoded  (typically  us¬ 
ing  iterative  or  message  passing  methods)  and  come  extremely  close  to  Shannon  limits.  Such 
“turbo-like”  coded  modulation  strategies  have  made,  or  are  making,  their  way  into  almost  every 
digital  communication  technology,  including  cellular,  WiFi,  digital  video  broadcast,  optical  com¬ 
munication,  magnetic  recording,  and  flash  memory.  Thus,  we  often  summarize  the  performance 
of  a  practical  coded  modulation  scheme  by  stating  how  far  away  it  is  from  the  Shannon  limit. 
Let  us  discuss  what  this  means  via  an  example.  A  rate  |  binary  code  is  employed  using  bit 
interleaved  coded  modulation  with  a  16QAM  alphabet.  We  are  told  that  it  operates  2  dB  away 
from  the  Shannon  limit  at  a  BER  of  ICE5.  What  does  this  statement  mean? 

Since  we  can  convey  4  coded  bits  every  time  we  send  a  16QAM  symbol,  the  spectral  efficiency 
r  =  )  x  4  =  2  bps/Hz,  or  information  bits  per  symbol:  the  product  of  the  binary  code  rate 
(number  of  information  bits  per  coded  bit)  and  the  number  of  coded  bits  per  symbol  gives  the 
number  of  information  bits  per  symbol).  From  (7.11),  the  minimum  possible  E^/Nq  is  found  to 
be  about  1.8  dB.  This  is  the  minimum  possible  E^/N0  at  which  Shannon  tells  us  that  error-free 
operation  (in  the  limit  of  large  code  blocklengths)  is  possible  at  the  given  spectral  efficiency. 
Of  course,  any  practical  strategy  at  finite  blocklength,  no  matter  how  large,  will  not  give  us 
error-free  operation,  hence  we  declare  some  value  of  error  probability  that  we  are  satisfied  with, 
and  evaluate  the  Eh/N0  for  which  that  error  probability  is  attained.  Hence  the  statement  that 
we  started  with  says  that  our  particular  coded  modulation  strategy  provides  BER  of  ICE5  at 
Eb/N0  =  1.8  +  2  =  3.8  dB  (2  dB  higher  than  the  Shannon  limit). 

How  much  gain  does  the  preceding  approach  provide  over  uncoded  communication?  Let  us 
compare  it  with  uncoded  QPSK,  which  has  the  same  spectral  efficiency  of  2  bps/Hz.  The  BER 
is  given  by  the  expression  Q(y/'2Eb/N0),  and  we  can  check  that  ICE5  BER  is  attained  at  E^/Nq 
of  9.6  dB.  Thus,  we  get  a  significant  coding  gain  of  9.6  —  3.8  =  5.8  dB  from  using  a  sophisticated 
coded  modulation  strategy,  first  expanding  the  constellation  from  QPSK  to  16QAM  so  there  is 
“room”  to  insert  redundancy,  and  then  using  a  powerful  binary  code. 

The  specific  approach  for  applying  a  turbo- like  coded  modulation  strategy  depends  on  the  system 
at  hand.  For  systems  with  retransmissions  (e.g.,  wireless  data),  we  are  often  happy  with  block 
error  rates  of  1%  or  even  higher,  and  may  be  able  to  use  these  relatively  relaxed  specifications  to 
focus  on  reducing  computational  complexity  and  coding  delay  (e.g.,  by  considering  simpler  codes 
and  smaller  block  lengths).  For  systems  where  there  is  no  scope  for  retransmissions  (e.g.,  storage 
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or  broadcast),  we  may  use  longer  block  lengths,  and  may  even  layer  an  outer  code  to  clean  up  the 
residual  errors  from  an  inner  turbo-like  coded  modulation  scheme.  Another  common  feature  of 
many  systems  is  the  use  of  adaptive  coded  modulation,  in  which  the  spectral  efficiency  is  varied 
as  a  function  of  the  channel  quality.  BICM  is  particularly  convenient  for  this  purpose,  since  it 
allows  us  to  mix  and  match  a  menu  of  well-optimized  binary  codes  at  different  rates  (e.g.,  ranging 
from  |  to  if)  with  a  menu  of  standard  constellations  (e.g.,  QPSK,  8PSK,  16QAM,  64QAM)  to 
provide  a  large  number  of  options. 

A  detailed  description  of  turbo-like  codes  is  beyond  our  present  scope,  but  we  do  provide  a 
discussion  of  decoding  via  message  passing  after  a  basic  exposition  of  linear  codes. 


7.4  Introducing  linear  codes 

After  decades  of  struggling  to  construct  practical  coding  strategies  that  approach  Shannon’s 
performance  limits,  we  can  now  essentially  declare  victory,  with  channel  codes  of  reasonable 
block  length  coming  within  less  than  a  dB  of  capacity.  We  refer  the  reader  to  more  advanced 
texts  and  the  research  literature  for  details  regarding  such  capacity-achieving  codes,  and  limit 
ourselves  here  to  establishing  some  basic  terminology  and  concepts  which  provide  a  roadmap  for 
further  exploration.  We  restrict  attention  to  linear  codes,  which  are  by  far  the  most  prevalent 
class  of  codes  in  use  today,  and  suffice  to  approach  capacity.  As  we  discuss  shortly,  a  linear  code 
is  a  subspace  in  a  bigger  vector  space,  but  the  arithmetic  we  use  to  define  linearity  is  different 
from  the  real-  and  complex-valued  arithmetic  we  are  used  to. 

Finite  fields:  We  are  used  to  doing  calculations  with  real  and  complex  numbers.  The  real 
numbers,  together  with  the  rules  of  arithmetic,  comprise  the  real  field,  and  the  complex  numbers, 
together  with  the  rules  of  arithmetic,  comprise  the  complex  field.  Each  of  these  fields  has 
infinitely  many  elements,  forming  a  continuum.  However,  the  basic  rules  of  arithmetic  (addition, 
multiplication,  division  by  nonzero  elements,  and  the  associative,  distributive  and  commutative 
laws)  can  also  be  applied  to  fields  with  a  finite  number  of  discrete  elements.  Such  fields  are 
called  finite  fields,  or  Galois  fields  (after  the  French  mathematician  who  laid  the  foundations  for 
finite  field  theory).  It  turns  out  that,  in  order  to  be  consistent  with  the  basic  rules  of  arithmetic, 
the  number  of  elements  in  a  finite  field  must  be  a  power  of  a  prime,  and  we  denote  a  finite  field 
with  pm  elements,  where  p  is  a  prime,  and  m  a  positive  integer,  as  GF(pm).  The  theory  of  finite 
fields,  while  outside  our  scope  here,  is  essential  for  a  variety  of  algebraic  code  constructions,  and 
we  provide  some  pointers  for  further  study  later  in  this  chapter.  Our  own  discussion  here  is 
restricted  to  codes  over  the  binary  field  GF( 2). 

Binary  arithmetic:  Binary  arithmetic  corresponds  to  operations  with  only  two  elements,  0 
and  1,  with  addition  modulo  2  as  specified  in  (7.13).  Binary  subtraction  is  identical  to  binary 
addition.  Multiplication  and  division  (division  only  permitted  by  nonzero  elements)  are  trivial, 
since  the  only  nonzero  element  is  1.  The  usual  associative,  distributive  and  commutative  laws 
apply.  The  elements  |0, 1},  together  with  these  rules  of  binary  arithmetic,  are  said  to  comprise 
the  binary  field  GF( 2). 

Linear  binary  code:  An  (n,  k )  binary  linear  code  C  consists  of  2k  possible  codewords,  each 
of  length  n,  such  that  adding  any  two  codewords  in  binary  arithmetic  yields  another  codeword. 
That  is,  C  is  closed  under  linear  combinations  (the  coefficients  of  the  linear  combination  can  only 
take  values  0  or  1,  since  we  are  working  in  binary  arithmetic),  and  is  therefore  a  k- dimensional 
subspace  of  the  n-dimensional  vector  space  of  all  n  length  binary  vectors,  in  a  manner  that  is 
entirely  analogous  to  the  concept  of  subspace  in  real- valued  vector  spaces.  Pursuing  this  analogy 
further,  we  can  specify  a  linear  code  C  completely  by  defining  a  basis  with  k  vectors,  such  that 
any  vector  in  C  (i.e.,  any  codeword)  can  be  expressed  as  a  linear  combination  of  the  basis  vectors. 

Food  for  thought:  The  all-zero  codeword  is  always  part  of  any  linear  code.  Why? 
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Notational  convention:  While  we  have  preferred  working  with  column  vectors  thus  far,  in  defer¬ 
ence  to  the  convention  in  most  coding  theory  texts,  we  express  codewords  as  row  vectors.  Letting 
u  and  v  denote  two  binary  vectors  of  the  same  length,  we  denote  by  u  ©  v  their  component  by 
component  addition  over  the  binary  field.  For  example,  (00110)  ©  (10101)  =  (10011). 

Example  7.4.1  (Repetition  code)  An  (n,  1)  repetition  code  has  only  two  codewords,  the  all- 
one  codeword  xi  =  (1, ...,  1)  and  the  all-zero  codeword  x0  =  (0,  ...,0).  We  see  that  xi  ©  xi  = 
x0  ©  x0  =  x0  and  that  xi  ©  x0  =  x0  ©  xi  =  xi,  so  that  this  is  indeed  a  linear  code.  There  are 
only  21  codewords,  so  that  the  dimension  k  —  1.  Thus,  the  code,  when  viewed  as  a  vector  space 
over  the  binary  field,  is  spanned  by  a  single  basis  vector,  xi.  While  the  encoding  operation  is 
trivial  (just  repeat  the  information  bit  n  times)  for  this  code,  let  us  write  it  in  a  manner  that 
leads  into  a  more  general  formalism.  For  example,  for  the  (5, 1)  repetition  code,  the  information 
bit  u  G  {0, 1}  is  mapped  to  codeword  x  as  follows: 

x  =  mG 

where 

G  =  (1  1  1  1  1)  (7.18) 

is  a  matrix  whose  rows  (just  one  row  in  this  case)  provide  a  basis  for  the  code. 


Example  7.4.2  (Single  parity  check  code)  An  (n,n  —  1)  single  parity  check  code  takes  as 
input  n  —  1  unconstrained  information  bits  u  =  (rq, ...,  un_i),  maps  them  unchanged  to  n  —  1  bits 
in  the  codeword,  and  adds  a  single  parity  check  bit  to  obtain  a  codeword  x  =  (xi,  ...,xn-i ,xn). 
For  example,  we  can  set  the  first  n  —  1  code  bits  to  the  information  bits  (xi  —  iq, ...,  xn_i  =  wn_i) 
and  append  a  parity  check  bit  as  follows: 


xn  =  Xi  ©  x2...  ©  Xn_i 


Here,  the  code  dimension  k  =  n  —  1,  so  that  we  can  describe  the  code  using  n  —  1  linearly 
independent  basis  vectors.  For  example,  for  the  (5,4)  single  parity  check  code,  a  particular 
choice  of  basis  vectors,  put  as  rows  of  a  matrix  as  follows: 
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0 
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0 

1 

!/ 

(7.19) 


We  can  now  check  that  any  codeword  can  be  written  as 


x  =  uG 


where  u  =  (rq, ...,  w4)  is  the  information  bit  sequence. 


Generator  matrix:  While  the  preceding  examples  are  very  simple,  they  provide  insight  into 
the  general  structure  of  linear  codes.  An  (n,  k )  linear  code  can  be  represented  by  a  basis  with 
k  linearly  independent  vectors,  each  of  length  n.  Putting  these  k  basis  vectors  as  the  rows  of  a 
k  x  n  matrix  G,  we  can  then  define  a  mapping  from  k  information  bits,  represented  as  a  1  x  k 
row  vector  u,  to  n  code  bits,  represented  asalxn  row  vector  x,  as  follows: 

x  =  uG  (7.20) 

The  matrix  G  is  called  the  generator  matrix  for  the  code,  since  it  can  be  used  to  generate  all  2k 
codewords  by  cycling  through  all  possible  values  of  the  information  vector  u. 

Dual  codes:  Drawing  again  on  our  experience  with  real-valued  vector  spaces,  we  know  that, 
for  any  fc-dimensional  subspace  C  in  an  n-dimensional  vector  space,  we  can  find  an  orthogonal 
n  —  k  dimensional  subspace  C1  such  that  every  vector  in  C  is  orthogonal  to  every  vector  in  C1. 
The  subspace  C1  is  itself  an  (n,  n  —  k)  code,  and  C  and  C1  are  said  to  be  duals  of  each  other. 
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Example  7.4.3  (Duality  of  repetition  and  single  parity  check  codes)  It  can  be  checked 
that  the  (5,4)  single  parity  check  code  and  (5, 1)  repetition  codes  are  duals  of  each  other.  That 
is,  each  codeword  in  the  (5, 4)  code  is  orthogonal  to  each  codeword  in  the  (5, 1)  code.  Since 
codewords  are  linear  combinations  of  rows  of  the  generator  matrix,  it  suffices  to  check  that  each 
row  of  a  generator  matrix  for  the  (5, 4)  code  is  orthogonal  to  each  row  of  a  generator  matrix  for 
the  (5, 1)  code.  Specifically, 


G(5!i)G^j4)  —  (mu) 


/  1  0  0  0  \ 
0  10  0 
0  0  10 
0  0  0  1 
\1  1  1  1/ 


0 


Parity  check  matrix:  The  preceding  discussion  shows  that  we  can  describe  an  (n,  k)  linear 
code  C  by  specifying  its  dual  code  C1.  In  particular,  a  generator  matrix  for  the  dual  code  serves 
as  a  parity  check  matrix  H  for  C,  in  the  sense  that  an  rt- dimensional  binary  vector  x  lies  in  C  if 
and  only  if  it  is  orthogonal  to  each  row  of  H.  That  is, 

Hxt  =  0  if  and  only  if  x  e  C  (7.21) 


Each  row  of  the  parity  check  matrix  defines  a  parity  check  equation.  Thus,  for  a  parity  check 
matrix  H  of  dimension  (n  —  k)  x  n,  each  codeword  must  satisfy  n  —  k  parity  check  equations. 
Equivalently,  if  G  is  a  generator  matrix  for  C,  then  it  must  satisfy 

HGr  =  0  (7.22) 


In  our  examples,  the  generator  matrix  for  the  (5, 1)  repetition  code  is  a  parity  check  matrix  for 
the  (5,4)  code,  and  vice  versa. 


For  an  (n,  k )  code  with  large  n  and  k,  it  is  clearly  difficult  to  check  by  brute  force  search 
enumeration  over  2k  codewords  whether  a  particular  n-dimensional  vector  y  is  a  valid  codeword. 
However,  for  a  linear  code,  it  becomes  straightforward  to  verify  this  using  only  n  —  k  parity  check 
equations,  as  in  (7.21).  These  parity  check  equations,  which  provide  the  redundancy  required 
to  overcome  channel  errors,  are  important  not  only  for  verification  of  correct  termination  of 
decoding,  but  also  play  a  crucial  role  during  the  decoding  process,  as  we  illustrate  shortly. 


Non-uniqueness:  An  (n,  k)  linear  code  C  is  a  unique  subspace  consisting  of  a  set  of  2k  code¬ 
words,  and  its  dual  ( n,n  —  k )  code  CL  is  a  unique  subspace  comprising  2n~k  codewords.  However, 
in  general,  neither  the  generator  nor  the  parity  check  matrix  for  a  code  are  unique,  since  the 
choice  of  basis  for  a  nontrivial  subspace  is  not  unique.  Thus,  while  the  generator  matrix  for 
the  (5, 1)  code  is  unique  because  of  its  trivial  nature  (one  dimension,  binary  held),  the  generator 
matrix  for  the  (5, 4)  code  is  not.  For  example,  by  taking  linear  combinations  of  the  rows  in  (7.19), 
we  obtain  another  linearly  independent  basis  that  provides  an  alternative  generator  matrix  for 
the  (5,4)  code: 
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From  (7.20),  we  see  that  different  choices  of  generator  matrices  correspond  to  different  ways  of 
encoding  a  k- dimensional  information  vector  u  into  an  n-dimensional  codeword  x  G  C. 


Systematic  encoding:  A  systematic  encoding  is  one  in  which  the  information  vector  u  appears 
directly  in  x  (without  loss  of  generality,  we  can  take  the  bits  of  u  to  be  the  first  k  bits  in  x),  so 
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that  there  is  a  clear  separation  between  “information  bits”  and  “parity  check”  bits.  In  this  case, 
the  generator  matrix  can  be  written  as 


G  =  I/-[P|  systematic  encoding  (7.24) 

where  R  denotes  the  k  x  k  identity  matrix,  and  P  is  a  k  x  (n  —  k)  matrix  specifying  how  the  n  —  k 
parity  bits  depend  on  the  input.  The  identity  matrix  ensures  that  the  k  rows  of  G  are  linearly 
independent,  so  this  does  represent  a  valid  generator  matrix  for  an  (n,  k )  code.  The  ith  row  of 
the  generator  matrix  (7.24)  corresponds  to  an  information  vector  u  =  (iq, ...,  w*.)  with  Ui  =  1 
and  Uj  =  0,  j  ^  i.  Note  that,  even  when  we  restrict  the  encoding  to  be  systematic,  the  generator 
matrix  is  not  unique  in  general.  The  generator  matrices  (7.18)  and  (7.19)  for  the  (5, 1)  and  (5,4) 
codes  correspond  to  systematic  encoding.  The  encoding  of  the  (5, 4)  code  corresponding  to  the 
generator  matrix  in  (7.23)  is  not  systematic. 

Reading  off  a  parity  check  matrix  from  a  systematic  generator  matrix:  If  we  are  given 
a  systematic  encoding  of  the  form  (7.24),  we  can  easily  read  off  a  parity  check  matrix  as  follows: 

H  =  [-Pr|ln_fc]  (7.25) 

where  the  negative  sign  can  be  dropped  for  the  binary  held.  The  identity  matrix  ensures  that 
n  —  k  rows  of  H  are  linearly  independent,  hence  this  is  a  valid  parity  check  matrix  for  an  (n,  k) 
linear  code.  We  leave  it  as  an  exercise  to  verify,  by  directly  substituting  from  (7.24)  and  (7.25), 
that  HGT  =  0. 


Example  7.4.4  (running  example:  a  (5,  2)  linear  code)  Let  us  now  construct  a  somewhat 
less  trivial  linear  code  which  will  serve  as  a  running  example  for  illustrating  some  basic  concepts. 
Suppose  that  we  have  k  =  2  information  bits  u\,u2  €  {0, 1}  that  we  wish  to  protect.  We  map 
this  (using  a  systematic  encoding)  to  a  codeword  of  length  5  using  a  combination  of  repetition 
and  parity  check,  as  follows: 

x  =  (ui,u2,ui,u2,ui  ®u2)  (7.26) 

A  systematic  generator  matrix  for  this  (5,  2)  code  can  be  constructed  by  considering  the  two 
codewords  corresponding  to  u  =  (1,  0)  and  u  =  (0, 1),  respectively,  which  gives: 

/  1  0  1  0 
\  0  1  0  1 


(7.27) 


We  can  read  off  a  parity  check  matrix  using  (7.24)  and  (7.25)  to  obtain: 

/  1  0  1  0  0  \ 

H=  0  1  0  1  0  (7.28) 

\  1  1  0  0  1  / 

Any  codeword  x  =  (aq,  ...,£5)  must  satisfy  Hx1  =  0,  which  corresponds  to  the  following  parity 

check  equations: 

aq  ©  x3  =  0 
x2  ©  x4  =  0 
aq  ©  x2  ©  x3  =  0 


Suppose,  now,  that  we  transmit  the  (5,  2)  code  that  we  have  just  constructed  over  a  BSC  with 
crossover  probability  p.  That  is,  we  send  a  codeword  x  =  (aq,  ...,x5)  using  the  channel  n  —  5 
times.  According  to  our  discrete  memoryless  channel  model,  errors  occur  independently  for  each 
of  the  code  symbols,  and  we  get  the  output  y  =  (7/1,  ...,7/5),  where  P[yi\xi)  =  aq  with  probability 
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1  —  p,  and  P[yi\xi\  =  ay  ©  1  (i.e,  the  bit  is  flipped)  with  probability  p.  How  should  we  try  to 
decode  (i.e.,  estimate  which  codeword  x  was  sent  from  the  noisy  output  y)?  And  how  do  we 
evaluate  the  performance  of  our  decoding  rule?  In  order  to  relate  these  to  the  structure  of  the 
code,  it  is  useful  to  reiterate  the  notion  of  Hamming  distance,  and  to  introduce  the  concept  of 
Hamming  weight. 

Hamming  distance:  The  Hamming  distance  dn{ u,  v)  between  two  binary  vectors  u  and  v  of 
equal  length  is  the  number  of  places  in  which  they  differ. 

For  example,  the  Hamming  distance  between  the  two  rows  of  the  generator  matrix  G  in  (7.27) 
is  given  by  dH{ gi,  g2)  =  4. 

Hamming  weight:  The  Hamming  weight  U'n(xi)  of  a  binary  vector  u  equals  the  number  of 
ones  it  contains. 

For  example,  the  Hamming  weight  of  each  row  of  the  generator  matrix  G  in  (7.27)  is  3. 

The  Hamming  distance  between  two  vectors  u  and  v  is  the  Hamming  weight  of  their  binary  sum: 

dH(u,  v)  =  wh{u(B  v)  (7.29) 

Structure  of  an  (n,  k )  linear  code:  Consider  a  specific  codeword  xo  in  a  linear  code  C, 
and  consider  its  Hamming  distance  from  another  codeword  xGC.  We  know  that  d ^(x0,x)  = 
uy/(x0  ©  x).  By  linearity,  x  =  x0  ©  x  is  also  a  codeword  in  C,  and  distinct  choices  of  x  give 
distinct  codewords  x.  Thus,  as  we  run  through  all  possible  codewords  x  e  C,  we  obtain  all 
possible  codewords  xgC  (including  x  =  0  for  x  =  x0).  Thus,  d#(x, o,x)  =  wh(x),  so  that  the 
set  of  Hamming  distances  between  xo  and  all  codewords  in  C  (running  through  all  2k  choices  of 
x)  is  precisely  the  set  of  weights  that  the  codewords  in  C  have  (corresponding  to  the  2k  distinct 
vectors  x,  one  for  each  x). 

Minimum  distance:  The  minimum  distance  of  a  code  is  defined  as 

d-min  minx1,x2ec,x1?‘x24(xi,x2) 

Applying  (7.29),  and  noting  that,  for  a  linear  code,  x,  ©  x2  is  a  nonzero  codeword  in  C,  we  see 
that  the  minimum  distance  equals  the  minimum  weight  among  all  nonzero  codewords.  That  is, 

dmin  =  Wmin  =  min xec,xyofotf(x)  ,  for  a  linear  code  (7.30) 

The  (5,2)  code  is  small  enough  that  we  can  simply  list  all  four  codewords:  00000,  10101,  01011, 
and  11110,  from  which  we  see  that  wmin  =  dmin  =  3. 

Guarantees  on  error  correction:  A  code  is  guaranteed  to  correct  t  errors  if 

2t  +  1  <  dmin  (7.31) 

It  is  quite  easy  to  see  why:  we  can  set  up  non-overlapping  “decoding  spheres”  of  radius  t  around 
any  codeword.  The  decoding  sphere  of  radius  t  around  a  codeword  x  is  defined  as  the  set  of 
vectors  y  within  Hamming  distance  t  of  the  codeword,  as  follows: 

A(x)  =  {y  :  dH{ y,x)  <  t} 

The  condition  (7.31)  guarantees  that  these  decoding  spheres  do  not  overlap.  Thus,  if  we  make  at 
most  t  errors,  we  are  guaranteed  that  the  received  vector  falls  into  the  unique  decoding  sphere 
corresponding  to  the  transmitted  codeword. 

Erasures:  There  are  some  scenarios  for  which  it  is  useful  to  introduce  the  concept  of  erasures, 
which  correspond  to  assigning  a  “don’t  know”  to  a  symbol  rather  than  making  a  hard  decision. 
Using  a  similar  argument  as  before,  we  can  state  that  a  code  is  guaranteed  to  correct  t  errors 
and  e  erasures  if 

2t  +  e  +  1  <  drain  (7.32) 
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Since  it  is  “twice  as  easy”  to  correct  erasures  than  to  correct  errors,  we  may  choose  to  design 
a  demodulator  to  put  out  erasures  in  regions  where  we  are  uncertain  about  our  hard  decision. 
For  a  binary  channel,  this  means  that  our  input  alphabet  is  {0, 1}  but  our  output  alphabet  is 
{0, 1,  e},  where  e  denotes  erasure.  As  we  see  in  Section  7.5,  we  can  go  further  down  this  path  in 
hedging  our  bets,  with  the  decoder  using  soft  decisions  which  take  values  in  a  real-valued  output 
alphabet. 

Running  example:  Our  (5,  2)  code  has  dmin  =  3,  and  hence  can  correct  1  error  or  2  erasures  (but 
not  both).  Let  us  see  how  we  would  structure  brute  force  decoding  of  a  single  error,  by  writing 
down  which  vectors  fall  within  decoding  spheres  of  unit  radius  around  each  codeword,  and  also 
pointing  out  which  vectors  are  left  over.  This  is  done  by  writing  all  25  possible  binary  vectors  in 
what  is  termed  a  standard  array. 


00000 

10101 

01011 

11110 

10000 

00101 

11011 

OHIO 

01000 

11101 

00011 

10110 

00100 

10001 

01111 

11010 

00010 

10111 

01001 

11100 

00001 

10100 

01010 

11111 

11000 

01101 

10011 

00110 

01100 

11001 

00111 

10010 

Table  7.1:  Standard  array  for  the  (5,2)  code 

Let  us  take  advantage  of  this  example  to  describe  the  general  structure  of  a  standard  array 
for  an  (n,  k )  linear  code.  The  array  has  2n~k  rows  and  2k  columns,  and  contains  all  possible 
binary  vectors  of  length  n.  The  first  row  of  the  array  consists  of  the  2k  codewords,  starting  with 
the  all-zero  codeword.  The  first  column  consists  of  error  patterns  ordered  by  weight  (ties  broken 
arbitrarily),  starting  with  no  errors  in  the  first  row,  ei  =  0.  In  general,  denoting  the  first  element 
of  the  ith  row  as  the  error  pattern  e,-,  the  jth  element  in  the  Ah  row  is  a?;j  =  +  x^,  where  Xj 

denotes  the  jth  codeword,  j  =  1, ...,  2k.  That  is,  the  (i,  j)th  element  in  the  standard  array  is  the 
jth  codeword  translated  by  the  Ah  error  pattern.  For  the  standard  array  in  Table  7.1  for  (5,2) 
code,  the  first  row  consists  of  the  four  codewords.  We  demarcate  it  from  all  the  other  entries  in 
the  table,  which  are  not  codewords,  by  a  horizontal  line.  The  next  five  rows  correspond  to  the 
five  possible  one-bit  error  patterns,  which  we  know  can  be  corrected.  Thus,  for  the  jth  column, 
the  first  six  rows  correspond  to  the  decoding  sphere  of  Hamming  radius  one  around  codeword  Xj. 
We  demarcate  this  by  drawing  a  double  line  under  the  sixth  row.  Beyond  these,  the  first  entries 
of  the  remaining  row  are  arbitrarily  set  to  be  minimum  weight  binary  vectors  that  have  not 
appeared  yet.  We  cannot  guarantee  that  we  can  correct  these  error  patterns.  For  example,  the 
first  and  fourth  entries  in  rows  7  and  8  are  both  equidistant  from  the  first  and  fourth  codewords, 
hence  neither  of  these  patterns  can  be  mapped  unambiguously  to  a  decoding  sphere. 

Bounded  distance  decoding:  For  a  code  capable  of  correcting  at  least  t  errors,  bounded  distance 
decoding  at  radius  t  corresponds  to  the  following  rule:  decode  a  received  word  to  the  nearest 
codeword  (in  terms  of  Hamming  distance),  as  long  as  the  distance  is  at  most  t,  and  declare  decod¬ 
ing  failure  if  there  is  no  such  codeword.  A  conceptually  simple,  but  computationally  inefficient, 
way  to  think  about  this  is  in  terms  of  the  standard  array.  For  our  running  example  in  Table  7.1, 
bounded  distance  decoding  with  t  =  1  could  be  implemented  by  checking  if  the  received  word  is 
anywhere  in  the  first  six  rows,  and  if  so,  decode  it  to  the  first  element  of  the  column  it  falls  in. 
For  example,  the  received  word  10001  is  in  the  fourth  row  and  second  column,  and  is  therefore 
decoded  to  the  second  codeword  10101.  If  the  received  word  is  not  in  the  first  six  columns, 
then  we  declare  decoding  failure.  For  example,  the  received  word  01101  is  in  the  seventh  row 
and  hence  does  not  fall  in  the  decoding  sphere  of  radius  one  for  any  codeword,  hence  we  would 
declare  decoding  failure  if  we  received  it. 
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Each  row  of  the  standard  array  is  the  translation  of  the  code  C  by  its  first  entry,  e,;,  and  is  called 
a  coset  of  the  code.  The  first  entry  e*  is  called  the  coset  leader  for  the  ?'th  coset,  i  —  1, ...,  2n~k. 
We  now  note  that  a  coset  can  be  described  far  more  economically  than  by  listing  all  its  elements. 
Applying  a  parity  check  matrix  to  the  jth  element  of  the  fth  coset,  H(xj  ©  ef)T  =  Hef ,  we 
get  an  answer  that  depends  only  on  the  coset  leader,  since  Hx2  =  0  for  any  codeword  x.  We 
therefore  define  the  syndrome  for  the  ith  coset  as  Sj  =  He^.  The  syndrome  is  a  binary  vector  of 
length  n  —  k,  and  takes  2n~k  possible  values.  The  coset  leaders  and  syndromes  corresponding  to 
Table  7.1,  using  the  parity  check  matrix  (7.25),  are  listed  in  Table  7.2. 


Coset  leader 
00000 

Syndrome 

000 

10000 

101 

01000 

011 

00100 

100 

00010 

010 

00001 

001 

11000 

01100 

110 

111 

Table  7.2:  Mapping  between  coset  leaders  and  syndromes  for  the  (5,  2)  code  using  (7.25) 

Bounded  distance  decoding  using  syndromes:  Consider  a  received  word  y.  Compute  the  syndrome 
s  =  HyT.  If  the  syndrome  corresponds  to  a  coset  leader  e  that  is  within  the  decoding  sphere  of 
interest,  then  we  estimate  the  transmitted  codeword  as  x  =  y  +  e.  Consider  again  the  received 
word  y  =  10001  and  compute  its  syndrome  s  =  Hy2  =  100.  This  corresponds  to  the  fourth 
row  in  Table  7.2,  which  we  know  is  within  a  decoding  sphere  of  radius  one.  The  coset  leader  is 
e  =  00100.  Adding  this  to  the  received  word,  we  obtain  x  =  y  +  e  =  10101,  which  is  the  same 
result  that  we  obtained  by  direct  look-up  in  the  standard  array. 

Performance  of  bounded  distance  decoding:  Correct  decoding  occurs  if  the  received  word  is 
mapped  to  the  transmitted  word.  For  bounded  distance  decoding  with  t  —  1  for  the  (5,  2) 
code,  this  happens  if  and  only  if  there  is  at  most  one  error.  Thus,  when  a  codeword  for  the  (5,  2) 
code  is  sent  over  a  BSC  with  crossover  probability  p,  the  probability  of  correct  decoding  is  given 
by 

Pc  =  (1  -  pf  +  (  l  )  p(1  _  p)4 

If  the  decoding  is  not  correct,  let  us  term  the  event  incorrect  decoding.  One  of  two  things 
happen  when  the  decoding  is  incorrect:  the  received  word  falls  outside  the  decoding  sphere  of  all 
codewords,  hence  we  declare  decoding  failure,  or  the  received  word  falls  inside  the  decoding  sphere 
of  one  of  the  incorrect  codewords,  and  we  have  an  undetected  error.  The  sum  of  the  probabilities 
of  these  two  events  is  Pe  =  1  —  Pc.  Since  decoding  failure  (where  we  know  something  has  gone 
wrong)  is  less  damaging  than  decoding  error  (where  we  do  not  realize  that  we  have  made  errors), 
we  would  like  its  probability  Pcjf  to  be  much  larger  than  the  probability  Pue  of  undetected  error. 
For  large  block  lengths  n,  we  can  typically  design  codes  for  which  this  is  possible,  hence  we 
often  take  Pe  as  a  proxy  for  decoding  failure.  For  our  simple  running  example,  we  compute  the 
probabilities  of  decoding  failure  and  decoding  error  in  Problem  7.13.  Exact  computations  of  P* 
and  Pue  are  difficult  for  more  complex  codes,  hence  we  typically  resort  to  bounds  and  simulations. 

Even  when  we  use  syndromes  to  infer  coset  leaders  rather  than  searching  the  entire  standard 
array,  look-up  based  approaches  to  decoding  do  not  scale  well  as  we  increase  the  code  block 
length  n  and  the  decoding  radius.  A  significant  achievement  of  classical  coding  theory  has 
been  to  construct  codes  whose  algebraic  structure  can  be  exploited  to  devise  efficient  means 
of  mapping  syndromes  to  coset  leaders  for  bounded  distance  decoding  (such  methods  typically 
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involve  finding  roots  of  polynomials  over  finite  fields).  However,  much  of  the  recent  progress 
in  coding  has  resulted  from  the  development  of  iterative  decoding  algorithms  based  on  message 
passing  architectures,  which  permit  efficient  decoding  of  very  long,  random-looking  codes  which 
can  approach  Shannon  limits.  We  now  provide  a  simple  illustration  of  message  passing  via  our 
running  example  of  the  (5,  2)  code. 


X1 


x2 

x3 


x4 


x5 


C1 


c2 


c3 


Figure  7.9:  Tanner  graph  for  (5,2)  code  with  parity  check  matrix  given  by  (7.28). 


Tanner  graph:  A  binary  linear  code  with  parity  check  matrix  H  can  be  represented  as  a  Tanner 
graph,  with  variable  nodes  representing  the  coded  bits,  and  check  nodes  representing  the  parity 
check  equations.  A  variable  node  is  connected  to  a  parity  check  node  by  an  edge  if  it  appears 
in  that  parity  check  equation.  A  Tanner  graph  for  our  running  example  (5,  2)  code,  based  on 
the  parity  check  matrix  (7.28),  is  shown  in  Figure  7.9.  Check  node  C\  corresponds  to  the  parity 
check  equation  specified  by  the  first  row  of  (7.28),  x\  ©  £3  =  0,  and  is  therefore  connected  to  X\ 
and  £3.  Check  node  C2  corresponds  to  the  second  row,  £2®  £4  =  0,  and  is  therefore  connected  to 
£2  and  £4.  Check  node  c3  corresponds  to  the  third  row,  £1  ©£2  ®  £5  =  0,  and  is  connected  to  £1, 
£2,  and  £5.  The  degree  of  a  node  is  defined  to  be  the  number  of  edges  incident  on  it.  The  variable 
nodes  £1, ...,  £5  have  degrees  2,  2,  1,  1,  and  1,  respectively.  The  check  nodes  ci,  c2,  c3  have  degrees 
2,  2  and  3,  respectively.  The  success  of  message  passing  on  Tanner  graphs  is  sensitive  to  these 
degrees,  as  we  shall  see  shortly. 


b  ©c 


Figure  7.10:  Incoming  and  outgoing  messages  for  a  check  node. 


Bit  flipping  based  decoding:  Let  us  now  consider  the  following  simple  message  passing 
algorithm,  illustrated  via  the  example  in  Figure  7.11.  As  shown  in  the  example,  each  variable 
node  maintains  an  estimate  of  the  associated  bit,  initialized  by  what  was  received  from  the 
channel.  In  the  particular  example  we  consider,  the  received  sequence  is  10000.  We  know  from 
Table  7.1  that  a  bounded  distance  decoder  would  map  this  to  the  codeword  00000.  I11  message 
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passing  for  bit  flipping,  each  variable  node  sends  out  its  current  bit  estimate  on  all  outgoing 
edges.  Each  check  node  uses  these  incoming  messages  to  generate  new  messages  back  to  the 
variable  nodes,  as  illustrated  in  Figure  7.10,  which  shows  a  check  node  of  degree  3.  That  is,  the 
message  sent  back  to  a  variable  node  is  the  value  that  bit  should  take  in  order  to  satisfy  that 
particular  parity  check,  assuming  that  the  messages  coming  in  from  the  other  variable  nodes 
are  correct.  When  the  variable  nodes  get  these  messages,  they  flip  their  bits  if  ’’enough”  check 
node  messages  tell  them  to.  In  our  example  of  a  (5,  2)  code,  let  us  employ  the  following  rule:  a 
variable  node  flips  its  channel  bit  if  (a)  all  the  check  messages  coming  into  it  tell  it  to,  and  (b) 
the  number  of  check  messages  is  more  than  one  (so  as  to  provide  enough  evidence  to  override 
the  current  estimate). 

Figure  7.11  shows  how  bit  flipping  can  be  used  to  correct  the  one-bit  error  pattern  10000.  Both 
check  node  messages  to  variable  node  x\  say  that  it  should  take  value  0,  and  cause  it  to  flip 
to  the  correct  value.  On  the  other  hand,  Figure  7.12  shows  that  bit  flipping  gets  stuck  for  the 
one  bit  error  pattern  00001,  because  there  is  only  one  check  message  coming  into  variable  node 
X5,  which  is  not  enough  to  flip  it.  Note  that  both  of  these  error  patterns  are  correctable  using 
bounded  distance  decoding,  using  Table  7.1  or  Table  7.2.  This  reveals  an  important  property  of 
iterative  decoding  on  Tanner  graphs:  its  success  depends  critically  on  the  node  degrees,  which  of 
course  depend  on  the  particular  choice  of  parity  check  matrix  used  to  specify  the  Tanner  graph. 


Figure  7.11:  Bit  flipping  based  decoding  for  the  (5,2)  code  is  successful  for  this  error  pattern. 


Can  we  fix  the  problem  revealed  by  the  example  in  Figure  7.12?  Perhaps  we  can  choose  a  different 
parity  check  matrix  for  which  the  Tanner  graph  has  variable  nodes  of  degree  at  least  2,  so  that 
bit  flipping  has  a  chance  of  working?  For  codes  over  large  block  lengths,  it  is  actually  possible 
to  use  a  randomized  approach  for  the  design  of  parity  check  matrices  yielding  desirable  degree 
distributions,  enabling  spectacular  performance  approaching  Shannon  limits.  I11  these  regimes, 
iterative  decoding  goes  well  beyond  the  error  correction  capability  guarantees  associated  with 
the  code’s  minimum  distance.  However,  such  results  do  not  apply  to  the  simple  example  we  are 
considering,  where  iterative  decoding  is  having  trouble  decoding  even  up  to  the  guarantee  of  t  —  1 
associated  with  a  minimum  distance  dmin  =  3.  However,  this  gives  us  the  opportunity  to  present 
a  trick  that  can  be  useful  even  for  large  block  lengths:  use  redundant  parity  check  nodes,  adding 
one  or  more  rows  to  the  parity  check  matrix  that  are  linearly  dependent  on  other  rows.  Figure 
7.13  shows  a  Tanner  graph  for  the  (5,2)  code  with  a  redundant  check  node  C4  corresponding  to 
xs  ©  £4  ©  x5  =  0.  That  is,  we  have  added  a  fourth  row  00111  to  the  parity  check  matrix  (7.25). 
This  row  is  actually  a  sum  of  the  first  three  rows,  and  hence  would  add  no  further  information 
if  we  were  just  performing  look-up  based  bounded  distance  decoding.  However,  revisiting  the 
troublesome  error  pattern  00001,  we  see  that  this  redundant  check  makes  all  the  difference  in 
the  performance  of  bit  flipping  based  decoding;  as  Figure  7.14  shows,  the  pattern  can  now  be 
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0 


0 


Figure  7.12:  Bit  flipping  based  decoding  for  the  (5,  2)  code  is  unsuccessful  for  this  error  pattern, 
even  though  it  is  correctable  using  bounded  distance  decoding. 


corrected. 


ci 


(redundant  parity  check) 

Figure  7.13:  Tanner  graph  for  (5,  2)  code  with  one  redundant  parity  check. 


7.5  Soft  decisions  and  belief  propagation 

We  have  discussed  decoding  of  linear  block  codes  based  on  hard  decision  inputs,  where  the  input 
to  the  decoder  is  a  string  of  bits.  However,  these  bits  are  sent  over  a  channel  using  modulation 
techniques  such  as  those  discussed  in  Chapter  4,  and  as  discussed  in  Chapter  6,  it  is  possible  to 
extract  soft  decisions  that  capture  more  of  the  information  we  have  about  the  physical  channel. 
In  this  section,  we  discuss  how  soft  decisions  can  be  used  in  iterative  decoding,  illustrating  the 
key  concepts  using  our  running  example  (5,  2)  code.  We  restrict  attention  to  BPSK  modulation 
over  a  discrete-time  AWGN  channel,  but  as  the  discussion  in  Chapter  6  indicates,  the  concept 
of  soft  decisions  is  applicable  to  any  signaling  scheme. 

A  codeword  x  =  (x[l], ...,  x[n])  with  elements  taking  values  in  {0, 1}  can  be  mapped  to  a  sequence 
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These  bits  remain 


Figure  7.14:  Bit  flipping  based  decoding  for  the  (5,2)  code  using  a  redundant  parity  check  is 
now  successful  for  the  00001  error  pattern. 


of  BPSK  symbols  using  the  transformation 


b[m\  =  (— l)dm]  ,  m  —  (7.33) 

The  advantage  of  this  map  is  that  it  transforms  binary  addition  into  real-valued  multiplication. 
That  is,  i[mj]  ©  x[m 2]  maps  to  b[mi\b[rri2\-  The  BPSK  symbols  are  transmitted  over  a  discrete 
time  AWGN  channel,  with  received  symbols  given  by 

y[m\  =  Ab[m]  +  w[m]  =  A(—l)x^  +  w[m\  ,  m  —  1, ...,  n  (7.34) 


where  the  amplitude  A  =  where  Es  denotes  the  energy/ symbol,  and  w[m\  ~  1V(0,  cr2) 

are  i.i.d.  discrete  time  WGN  samples.  To  simplify  notation,  consider  a  single  bit  xg{0,1}, 
mapped  to  b  G  {—1,  +1},  with  received  sample  y  =  Ab  +  w,  w  ~  1V(0,  a2).  Consider  the  posterior 
probabilities  P[x  =  (%]  and  P[x  =  1| y\.  Since  P[x  =  0| y]  +  P[x  =  l|y]  =  1,  we  can  convey 
information  regarding  these  probabilities  in  a  number  of  ways.  One  particularly  convenient 
format  is  the  log  likelihood  ratio  (LLR),  defined  as 


L(x )  =  log 


P[x 

P\x 


0] 

1] 


log 


P[b  =  +1] 
P[b  =  -1] 


(7.35) 


where  we  omit  the  conditioning  on  y  to  simplify  notation.  We  can  go  from  LLRs  to  bit  proba¬ 
bilities  as  follows: 


P[x  =  0]  = 


0L{x) 


eLG)  +  1  ’ 

We  can  go  from  LLRs  to  hard  decisions  as  follows: 


P[x  =  1]  = 


eLG)  +  1 


(7.36) 


b  =  sign (L)  ,  x  =  I{L<0}  =  I{i)<0}  (7.37) 

where  b  G  {  —  1,  +1}  is  the  “BPSK”  version  of  x  G  {0, 1}. 

Suppose  that  the  prior  probability  of  bit  x  taking  value  0  is  hq(x).  This  notation  implies  that  the 
prior  probability  could  vary  across  bits:  while  we  do  not  need  this  for  the  examples  considered 
here,  allowing  this  level  of  generality  is  useful  for  some  decoder  structures,  such  as  for  turbo 
codes,  the  information  about  bit  x  supplied  by  a  given  decoder  component  may  be  interpreted 
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as  its  prior  probability  by  another  decoder  component.  We  can  now  apply  Bayes’  rule  to  show 
(see  Problem  7.17)  that  the  LLR  decomposes  as  follows: 


L( 


x 


[-pr  ior  (*t) 


L  channel \'E) 


(7.38) 


where 


and 


L prior (x)  lo§  j 


L  channeliz‘d) 


7Tq(x) 

-  vr0(x) 
2Ay 


(7.39) 

(7.40) 


Thus,  the  use  of  the  logarithm  enables  an  additive  decomposition  of  information  from  independent 
sources,  which  is  both  intuitively  pleasing  and  computationally  useful.  For  our  present  purpose, 
we  can  assume  that  x  takes  values  from  {0, 1}  with  equal  probability,  so  that  Lprior  =  0.  The 
LLR  L(x)  represents  the  strength  of  our  belief  in  whether  the  bit  is  0  or  1,  and  LLR-based 
message  passing  for  iterative  decoding  is  referred  to  as  belief  propagation. 


Incoming  messages 


=  Uj  +  u  2+  L  c 
=  u  j  +  Uj  +  L  c 


V2  =  Ut  +  U3  +  Lc 

Outgoing  messages 


Figure  7.15:  Variable  node  update. 


Belief  propagation:  We  describe  belief  propagation  over  a  Tanner  graph  for  a  linear  block  code 
by  specifying  message  generation  at  a  generic  variable  node  and  a  generic  check  node.  In  belief 
propagation,  the  message  going  out  on  an  edge  is  a  function  of  the  messages  coming  in  on  all  of 
the  other  edges.  At  a  variable  node,  all  of  the  LLRs  involved  refer  to  a  given  bit,  with  information 
coming  in  from  the  channel  and  from  check  nodes.  A  key  approximation  in  belief  propagation 
is  to  approximate  all  of  these  as  independent  sources  of  information,  so  that  the  corresponding 
LLRs  add  up;  this  an  excellent  approximation  for  large  block  lengths  that  may  not  really  apply 
to  our  small  running  example,  but  we  will  go  ahead  and  use  it  anyway  in  our  numerical  examples. 
Figure  7.15  shows  generation  of  an  outgoing  message  from  a  variable  node:  the  outgoing  message 
on  an  edge  is  the  sum  of  the  incoming  message  from  all  other  edges  (including  from  the  channel 
as  well  as  from  the  check  nodes).  Thus,  the  outgoing  message  on  a  given  edge  equals  the  sum 
of  all  incoming  messages,  minus  the  incoming  message  on  that  edge,  and  this  is  the  way  we 
implement  it  in  the  code  fragment  below.  For  simplicity,  a  node  of  degree  three  (not  counting 
the  edge  coming  from  the  channel)  is  shown  in  Figure  7.15,  but  the  computation  (and  the  code 
fragment  implementing  it)  applies  to  variable  nodes  of  arbitrary  degrees. 

Code  Fragment  7.5.1  Variable  Node  Update 

function  Lout  =  variable_update(Lchannel ,Lin  ) 

7„coraputes  outgoing  messages  from  a  variable  node 
°/„Lchannel  =  LLR  from  channel  for  that  variable 
%Lin  =  vector  of  LLRs  coming  in  from  check  nodes 
7»Lout  =  vector  of  LLRs  going  out  to  check  nodes 
7»Note:  dimension  of  Lin  and  Lout  =  variable  node  degree 

7. 
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/(outgoing  message  on  an  edge  =  sum  of  incoming  messages  on  all  other  edges 
% (including  LLR  from  channel) 

/(Efficient  computation:  sum  over  all  edges  and  subtract  incoming  message  for  each  edge 
Lout  =  sum  (Lin)  +  Lchannel  -  Lin;  "/(vector  of  the  same  dimension  as  Lin 

Exercise:  A  variable  node  of  degree  3  has  channel  LLR  0.25,  and  incoming  LLR  messages  from 
check  nodes  —1.5,  0.5,  —2. 

(a)  If  you  had  to  make  a  hard  decision  on  the  variable  based  on  this  information,  what  would  it 
be? 

(b)  What  are  the  outgoing  messages  back  to  the  check  nodes? 

Answers:  (a)  The  hard  decision  would  be  x  —  1  (b  —  —  1).  (b)  The  outgoing  messages  to  the 
check  nodes  are  —1.25.  —  3.25,  —0.75. 


Incoming  messages  Outgoing  messages 

(computed  using  tanh  rule) 

Figure  7.16:  Check  node  update.  The  outgoing  messages  are  computed  using  the  tanh  rule: 
tanh(w*,/2)  =  fl^fc  tanh(i>j/2). 


Message  generation  for  check  nodes,  depicted  in  Figure  7.16,  is  more  complicated.  Consider 
a  check  node  of  degree  three,  corresponding  to  the  parity  check  equation  x\  ©  x2  ©  x3  =  0. 
Suppose  that  the  incoming  messages  tells  us  that  the  LLRs  for  these  three  bits  are  v\  =  Lin(x i), 
V2  =  Lin(x 2),  and  v3  =  Lin(x 3).  Let  us  compute  the  outgoing  message  u3  =  Lout(x3 )  on  the  edge 
corresponding  to  variable  x3.  We  have 


Pout[x  3  =  0]  =  Pin[x  1  =  0,  x2  =  0]  +  Pin[x  1  =  1,  x2  =  1] 

=  Pin[Xl  =  0 \Pin[x2  =  0]  +  Pin[x  1  =  l]Pin[x2  =  1] 


Plugging  in  from  (7.36),  we  obtain  that 

^2 Lout  C3)  {x1)+Lin(x2)  _|_  2 

^L0ut{x 3)  _|_  ^  ^ in  Cl)  +  l)(eL-C2)  _|_  1) 

As  shown  in  Problem  7.18,  this  simplifies  to 

tanh  (u3/2)  =  tanh  (fi/2)  tanh  (v2/2) 

We  can  decompose  the  preceding  into  (intermediate)  hard  decisions  and  reliabilities  as  follows: 

sign(u3)  =  sign  (ui)  sign  (v2)  (7.43) 

log  |  tanh(w3/2)|  =  log  |  tanh(ui/2)|  +  log  |  tanh(u2/2)|  (7.44) 

Figure  7.16  illustrates  the  update  for  a  check  node  of  degree  3.  However,  these  computations 
generalize  to  a  check  node  of  arbitrary  degree,  as  implemented  in  the  following  code  fragment. 


(7.41) 

(7.42) 
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Code  Fragment  7.5.2  Check  Node  Update 

function  Lout  =  check_update (  Lin  ) 

°/„coraputes  messages  going  out  from  a  check  node 

°/„Lin  =  vector  of  messages  coming  in  from  variable  nodes 

°/„Lout  =  vector  of  messages  going  out  to  variable  nodes 

°/„convert  LLRs  to  reliabilities  and  signs 

reliabilities_in  =  log(abs (tanh(Lin/2) ) ) ; 

signs_in  =  sign(Lin); 

°/„compute  check  update 

reliabilities_out  =  sum(reliabilities_in)  -  reliabilities_in; 
sign_product  =  prod(signs_in) ; 
signs_out  =  sign_product . *signs_in; 

°/„convert  reliabilities  and  signs  back  to  LLRs 
Lout  =  2*atanh(exp(reliabilities_out) ) . *signs_out ; 

Exercise:  A  check  node  of  degree  4  has  incoming  LLRs  —3.5,  2.2,  0.25, 1.3. 

(a)  Is  the  check  satisfied  by  the  incoming  messages?  That  is,  if  we  made  hard  decisions  based  on 
the  incoming  LLRs,  would  they  satisfy  the  parity  check  equation  corresponding  to  this  node? 

(b)  Use  code  fragment  7.5.2  to  determine  the  corresponding  outgoing  LLRs.  How  are  the  signs 
and  reliabilities  of  the  outgoing  LLRs  related  to  those  of  the  incoming  messages? 

Answers:  (a)  No.  (b)  The  outgoing  LLRs  are  0.1139,  —0.1340,  —0.9217,  —0.1880.  The  signs  are 
flipped,  and  the  larger  reliabilities  become  smaller,  while  the  smallest  reliability  increases.  Why 
does  this  make  sense? 

Once  we  have  defined  the  computations  at  the  variable  and  check  nodes,  all  that  is  needed  to 
implement  belief  propagation  is  to  route  messages  according  to  the  edges  defined  by  a  given 
parity  check  matrix  (of  course,  the  choice  of  code  and  parity  check  matrix  determines  whether 
iterative  decoding  is  effective).  At  any  stage  of  iterative  decoding,  we  can  make  hard  decisions 
at  a  variable  node  using  (7.37),  where  the  LLR  is  the  sum  of  all  incoming  LLRs,  including 
the  channel  LLR.  If  the  resulting  estimated  vector  x  satisfies  Hx  =  0,  then  we  know  that  we 
have  obtained  a  valid  codeword  and  we  can  terminate  the  decoding.  Typically,  if  we  do  not 
obtain  a  valid  codeword  after  a  specified  number  of  iterations,  then  we  declare  decoding  failure. 
We  implement  belief  propagation  based  iterative  decoding  in  Software  Lab  7.1;  while  we  use  our 
running  example  (5,  2)  code,  the  software  developed  in  this  lab  provides  a  generic  implementation 
of  belief  propagation  for  any  linear  block  code  once  the  parity  check  matrix  is  specified. 


7.6  Concept  Summary 

This  section  provides  a  glimpse  of  channel  coding  concepts,  including  fundamental  performance 
limits  established  by  Shannon  theory  and  constructive  strategies  for  approaching  these  limits. 
Key  points  are  summarized  as  follows. 

•  The  need  for  non-trivial  channel  codes  is  motivated  by  examining  two  extremes  when  sending 
a  block  of  bits  over  a  binary  symmetric  channel:  uncoded  communication  (probability  of  packet 
error  tends  to  one  as  blocklength  increases)  and  repetition  coding  (the  code  rate  tends  to  zero 
as  blocklength  increases). 

•  Channel  coding  consists  of  introducing  structured  redundancy  in  the  transmitted  bits/symbols. 
While  there  are  many  possible  coded  modulation  strategies,  we  focus  on  BICM,  a  simple,  flexible, 
and  effective  approach  cascading  a  binary  code  and  an  interleaver,  followed  by  mapping  of  bits 
to  modulated  symbols. 

Shannon  limits 

•  For  a  given  channel  (fixing  parameters  such  as  power  and  bandwidth),  Shannon  theory  tells  us 
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that  there  is  a  well-defined  maximum  possible  rate  of  reliable  communication,  termed  the  channel 
capacity.  For  a  passband  bandlimited  AWGN  channel  with  bandwidth  W  (Hz),  the  capacity  is 
given  by  W  log2  (1  +  Es/N0),  which  translates  to  the  following  fundamental  power-bandwidth 
tradeoff: 

Es/N0  >2r-l,  Eb/N0  > 

r 

where  Es  is  the  energy  per  transmitted  symbol,  Eb  is  the  energy  per  information  bit,  and  r  is  the 
spectral  efficiency  (the  information  bit  rate  normalized  by  the  bandwidth).  These  results  were 
derived  after  first  showing  that  the  capacity  of  a  discrete  time  real  AWGN  channel  is  given  by 
|  log2(l  +  S/N)  bits  per  channel  use. 

•  The  channel  capacity  for  a  BSC  with  crossover  probability  p  is  1  —  Hb{p )  =  1  +  plog2p  + 
(1  —  p)  log2(l  —  p)  bits  per  channel  use.  For  BICM,  such  a  channel  is  obtained,  for  example,  by 
making  hard  decisions  on  Gray  coded  constellations. 

•  Shannon  limits  can  be  used  for  guidelines  for  choosing  system  sizing:  for  example,  the  combi¬ 
nation  of  code  rate  and  constellation  size  that  is  appropriate  for  a  given  SNR. 

•  The  performance  of  a  given  coded  modulation  strategy  can  be  compared  to  fundamental  limits 
by  comparing  the  SNR  at  which  it  attains  a  certain  performance  (e.g.,  a  BER  of  10-5)  with  the 
minimum  SNR  required  for  reliable  communication  at  that  spectral  efficiency. 

Linear  codes 

•  Linear  codes  are  a  popular  and  well-understood  design  choice  in  modern  communication  sys¬ 
tems.  The  2k  codewords  in  an  (n,  k)  binary  linear  code  C  form  a  ^-dimensional  subspace  of  the 
space  of  n-dimensional  binary  vectors,  under  addition  and  multiplication  over  the  binary  field. 
The  dual  code  C1  is  an  (n,  n  —  k)  linear  code  such  that  each  codeword  in  C  is  orthogonal  (under 
binary  inner  products)  to  each  codeword  in  C1. 

•  A  basis  for  an  (n,  k )  linear  code  C  can  be  used  to  form  a  generator  matrix  G.  A  fc-dimensional 
information  vector  u  can  be  encoded  into  an  n-dimensional  codeword  x  using  the  generator  ma¬ 
trix:  x  =  uG. 

•  A  basis  for  the  dual  code  CL  can  be  used  to  form  a  parity  check  matrix  H  satisfying  Hx]  =  0 
for  any  xEC. 

•  The  choices  for  G  and  H  are  not  unique,  since  the  choice  of  basis  for  a  linear  vector  space  is 
not  unique.  Furthermore,  we  may  add  redundant  rows  to  H  to  aid  in  decoding. 

•  The  number  of  errors  t  that  a  code  can  be  guaranteed  to  correct  satisfies  2t  +  1  <  dmin,  where 
dmin,  is  the  minimum  Hamming  distance  between  codewords.  For  a  linear  code,  dmin  equals  the 
minimum  weight  among  nonzero  codewords,  since  the  all- zero  vector  is  always  a  codeword,  and 
since  the  difference  between  codewords  is  a  codeword. 

•  The  translation  of  codewords  by  error  vectors  can  be  enumerated  in  a  standard  array,  whose 
rows  correspond  to  translations  of  the  entire  code,  termed  cosets,  by  a  given  error  pattern,  termed 
coset  leader.  A  more  compact  representation  lists  only  coset  leaders  and  syndromes,  obtained 
by  operating  the  parity  check  matrix  on  a  given  received  word.  These  can  be  used  to  carry  out 
a  look-up  based  implementation  of  bounded  distance  decoding. 

Tanner  graphs 

•  An  (n,  k )  linear  code  with  n  x  r  {r  >  n  —  k)  parity  check  matrix  H  can  be  represented  by  a 
Tanner  graph,  with  n  variable  nodes  on  one  side,  and  r  check  nodes  on  the  other  side,  with  an 
edge  between  the  jth  variable  and  ith  check  node  if  and  only  if  H(i,  j)  =  1. 

•  Message  passing  on  the  Tanner  graph  can  be  used  for  iterative  decoding,  which  scales  well 
to  very  large  code  block  lengths.  One  approach  is  to  employ  bit  flipping  algorithms  with  hard 
decision  inputs  and  binary  messages,  but  a  more  powerful  approach  is  to  use  soft  decisions  and 
belief  propagation. 

Soft  decisions  and  belief  propagation 

•  The  messages  passed  between  the  variable  and  check  nodes  are  the  bit  LLRs.  The  message 
going  out  on  an  edge  depends  on  the  messages  coming  in  on  all  the  other  edges. 
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•  Outgoing  messages  from  a  variable  node  are  generated  simply  by  summing  LLRs.  Outgoing 
messages  from  a  check  node  are  more  complicated,  but  can  be  viewed  as  a  product  of  signs,  and 
a  sum  of  reliabilities. 


7.7  Endnotes 

The  material  in  this  chapter  has  been  selected  to  make  two  points:  (a)  information  theory 
provides  fundamental  performance  benchmarks  that  can  be  used  to  guide  parameter  selection 
for  communication  links;  (b)  coding  theory  provides  constructive  strategies  for  approaching  these 
fundamental  benchmarks.  We  now  list  some  keywords  associated  with  topics  that  a  systematic 
exposition  of  information  and  coding  theory  might  cover,  and  then  provide  some  references  for 
further  study. 

Keywords:  A  systematic  study  of  information  theory,  and  its  application  to  derive  theorems  in 
source  and  channel  coding,  includes  concepts  such  as  entropy,  mutual  information,  divergence 
and  typicality.  A  systematic  study  of  the  structure  of  algebraic  codes,  such  as  BCH  and  RS  codes, 
is  required  to  understand  their  construction,  their  distance  properties  and  decoding  algorithms 
such  as  the  Berlekamp-Massey  algorithm.  A  study  of  convolutional  codes,  their  decoding  using 
the  Viterbi  algorithm,  and  their  performance  analysis,  is  another  important  component  of  a 
study  of  channel  coding.  Tight  integration  of  convolutional  codes  with  modulation  leads  to 
trellis  coded  modulation.  Suitably  interleaving  convolutional  codes  leads  to  turbo  codes,  which 
can  be  decoded  iteratively  using  the  forward-backward,  or  BCJR,  algorithm.  LDPC  codes,  which 
can  be  decoded  iteratively  by  message  passing  over  a  Tanner  graph  (as  described  here  and  in 
software  lab  7.1),  are  of  course  an  indispensable  component  in  modern  communication  design. 

Further  reading:  One  level  up  from  the  glimpse  provided  here  is  a  self-contained  introduction 
to  “just  enough”  information  theory  to  compute  performance  benchmarks  for  communication 
systems,  and  a  selection  of  constructive  coding  and  decoding  strategies  (including  convolutional, 
turbo,  and  LDPC  codes),  in  the  author’s  graduate  text  [7]  (Chapters  6  and  7).  The  textbook 
by  Cover  and  Thomas  [40]  is  highly  recommended  for  a  systematic  and  lucid  exposition  of 
information  theory.  Shannon’s  beautifully  written  work  [41]  establishing  the  held  is  also  highly 
recommended  as  a  source  of  inspiration.  The  textbook  by  McEliece  [42]  is  a  good  source  for  a  first 
exposure  to  information  theory  and  algebraic  coding.  A  detailed  treatment  of  algebraic  coding 
is  provided  by  the  textbook  by  Blahut  [43],  while  comprehensive  treatments  of  channel  coding, 
including  both  algebraic  and  turbo-like  codes,  are  provided  in  the  texts  by  Lin  and  Costello  [44] 
and  Moon  [45]. 


7.8  Problems 

Shannon  limits 

Problem  7.1  Consider  a  coded  modulation  strategy  pairing  a  rate  \  binary  code  with  QPSK. 
Assuming  that  this  scheme  performs  1.5  dB  away  from  the  Shannon  limit,  what  are  the  minimum 
values  of  Es/N0  (dB)  and  E^/Nq  (dB)  required  for  the  scheme  to  work? 

Problem  7.2  At  BER  of  10-5,  how  far  away  are  the  following  uncoded  constellations  from 
the  corresponding  Shannon  limits:  QPSK,  8PSK,  16QAM,  64QAM.  Use  the  nearest  neighbors 
approximation  for  BER  of  Gray  coded  constellations  in  Section  6.4. 

Problem  7.3  Consider  Gray  coded  QPSK,  8PSK,  16QAM,  and  64QAM. 

(a)  Assuming  that  we  make  ML  hard  decisions,  use  the  nearest  neighbors  approximation  for 
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BER  of  Gray  coded  constellations  in  Section  6.4  to  plot  the  BER  as  a  function  of  Es/Nq  (dB) 
for  each  of  these  constellations. 

(b)  The  hard  decisions  induce  a  BSC  with  crossover  probability  given  by  the  BERs  computed  in 

(a).  Using  the  BSC  capacity  formula  (7.17),  plot  the  capacity  in  bits  per  symbol  as  a  function 
of  Es/Nq  (dB)  for  each  constellation.  Also  plot  for  comparison  the  capacity  of  the  bandlimited 
AWGN  channel  given  by  ().  Comment  on  the  penalty  for  hard  decisions,  as  well  as  any  other 
trends  that  you  see. 


Problem  7.4  Consider  a  BICM  system  employing  a  rate  |  binary  code  with  Gray  coded  QPSK 
modulation. 

(a)  What  is  Es/N0  in  terms  of  Eb/N0 ? 

(b)  Based  on  the  AWGN  capacity  region  (7.11),  what  is  the  Shannon  limit  for  this  system  (i.e., 
the  minimum  required  Eb/N0  in  dB)? 

(c)  Now,  consider  the  suboptimal  strategy  of  making  hard  decisions,  thus  inducing  a  BSC.  What 
is  the  Shannon  limit  for  the  system?  What  is  the  degradation  in  dB  due  to  making  hard  decisions? 
Hint:  Hard  decisions  on  Gray  coded  QPSK  symbols  induce  a  BSC  with  crossover  probability 

p  =  Q  (^\J2ES/N^J ,  whose  capacity  is  given  by  (7.17).  The  Shannon  limit  is  the  minimum  value 

of  Eb/N0  for  the  capacity  to  be  larger  than  the  code  rate  being  used. 

Problem  7.5  A  rate  1/2  binary  code  is  employed  using  bit  interleaved  coded  modulation  with 
QPSK,  16QAM,  and  64QAM. 

(a)  What  are  the  bit  rates  attained  by  these  three  schemes  when  operating  over  a  passband 
channel  of  bandwidth  10  MHz  (ignore  excess  bandwidth). 

(b)  Assuming  that  each  coded  modulation  scheme  operates  2  dB  from  the  Shannon  limit,  what 
is  the  minimum  value  of  Es/Nq  (dB)  required  for  each  of  the  three  schemes  to  provide  reliable 
communication? 

(c)  Assuming  that  these  three  schemes  are  employed  in  an  adaptive  modulation  strategy  which 
adapts  the  data  rate  as  a  function  of  the  range.  Assuming  that  the  largest  attainable  range 
among  the  three  schemes  is  10  km.  Assuming  inverse  square  path  loss,  what  are  the  ranges 
corresponding  to  the  other  two  schemes. 

(d)  Now,  if  we  add  binary  codes  of  rate  |  and  |,  plot  the  attainable  bit  rate  versus  range  for 
an  adaptive  modulation  scheme  allowing  all  possible  pairings  of  code  rates  and  constellations. 
Assume  that  each  scheme  is  2  dB  away  from  the  corresponding  Shannon  limit. 

Problem  7.6  (a)  Apply  L’Hospital’s  rule  to  evaluate  the  limit  of  the  right-hand  side  of  (7.11)  as 
r  — y  0.  What  is  the  minimum  possible  Eb/N0  in  dB  at  which  reliable  communication  is  possible 
over  the  AWGN  channel? 

(b)  Re-plot  the  region  for  reliable  communication  shown  in  Figure  7.6,  but  this  time  with  spectral 
efficiency  r  (bps/Hz)  versus  the  SNR  Es/Nq  (dB).  Is  there  any  lower  limit  to  Es/N$  below  which 
reliable  communication  is  not  possible?  If  so,  what  is  it?  If  not,  why  not? 


Linear  codes  and  bounded  distance  decoding 

Problem  7.7  A  parity  check  matrix  for  the  (7,4)  Hamming  code  is  given  by 

/  1  0  0  1  0  1  1  \ 

H=  0  1  0  1  1  0  1  (7.45) 

\  0  0  1  0  1  1  1  / 

(a)  Find  a  generator  matrix  for  the  code. 

(b)  Find  the  minimum  distance  of  the  code.  How  many  errors  can  be  corrected  using  bounded 
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distance  decoding? 

Answer:  dmin  =  3,  hence  a  bounded  distance  decoder  can  correct  one  error. 

(c)  Write  down  the  standard  array.  Comment  on  how  any  structural  differences  you  see  between 
this  and  the  standard  array  for  the  (5,  2)  code  in  Table  7.1. 

Answer:  Unlike  in  Table  7.1,  no  binary  vectors  are  “left  over”  after  running  through  the  single 
error  patterns.  The  Hamming  code  is  a  perfect  code:  the  decoding  spheres  of  radius  one  cover 
the  entire  space  of  length-7  binary  vectors.  “Perfect”  in  this  case  just  refers  to  how  well  decoding 
spheres  can  be  packed  into  the  available  space;  it  definitely  does  not  mean  “good,”  since  the 
Hamming  code  is  a  weak  code. 

(d)  Write  down  the  mapping  between  coset  leaders  and  syndromes  for  the  given  parity  check 
matrix  (as  done  in  Table  7.2  for  the  (5,  2)  code). 

Problem  7.8  Suppose  that  the  (7, 4)  Hamming  code  is  used  over  a  BSC  with  crossover  proba¬ 
bility  p  =  0.01.  Assuming  that  bounded  distance  decoding  with  decoding  radius  one  is  employed, 
find  the  probability  of  correct  decoding,  the  probability  of  decoding  failure,  and  the  probability 
of  undetected  error. 

Problem  7.9  Append  a  single  parity  check  to  the  (7, 4)  Hamming  code.  That  is,  given  a 
codeword  x  =  (x\,...,x7)  for  the  (7,4)  code,  define  a  new  codeword  z  =  (aq, ...,  x7,  x%)  by 
appending  a  parity  check  on  the  existing  code  bits: 


x8  =  x1  ©  x2  ©  ©  x7 


This  new  code  is  called  an  extended  Hamming  code. 

(a)  What  are  n  and  k  for  the  new  code? 

(b)  What  is  the  minimum  distance  for  the  new  code? 

Problem  7.10  Hamming  codes  of  different  lengths  can  be  constructed  using  the  following  pre¬ 
scription:  the  parity  check  matrix  consists  of  all  nonzero  binary  vectors  of  length  m,  where  m  is 
a  positive  integer. 

(a)  What  is  the  value  of  m  for  the  (7, 4)  Hamming  code? 

(b)  For  arbitrary  m,  what  are  the  values  of  code  block  length  n  and  the  number  of  information 
bits  k  as  a  function  of  m? 

Hint:  The  code  block  length  is  the  number  of  columns  in  the  parity  check  matrix.  The  dimension 
of  the  dual  code  is  the  rank  of  the  parity  check  matrix.  Remember  that  row  rank  equals  the 
column  rank.  Which  is  easier  to  find  in  this  case? 

Problem  7.11  BCH  codes  (named  after  their  discoverers,  Bose,  Ray-Chaudhury,  and  Hoc- 
quenghem)  are  a  popular  class  of  linear  codes  with  a  well-defined  algebraic  structure  and  well- 
understood  algorithms  for  bounded  distance  coding.  For  a  positive  integer  m ,  we  can  construct 
a  binary  BCH  code  which  can  correct  at  least  t  errors  with  the  following  parameters: 

n  =  2m  —  1,  k  >  n  —  mt ,  >  2t  +  1  (7.46) 

so  that  the  code  rate  R  =  -  >  1  —  0ZLfl,  where  the  inequality  for  k  is  often  tight  for  small  values 
of  t.  For  example,  Hamming  codes  are  actually  (2m  —  1,  2m  —  1  —  m)  BCH  codes  with  t  —  1. 
Remark:  The  price  of  increasing  the  block  length  of  a  BCH  code  is  decoding  complexity.  Algebraic 
decoding  of  a  code  of  length  n  =  2m  —  1  requires  operations  over  GF(2m). 

(a)  Consider  a  (1023,  923)  BCH  code.  Assuming  that  the  inequality  for  k  is  tight,  how  many 
errors  can  it  correct? 

Answer:  t  =  10. 

(b)  Assuming  that  the  inequality  for  k  in  (7.46)  is  tight,  what  is  the  rate  of  a  BCH  code  with 
n  =  511  and  t  =  10? 
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Problem  7.12  Consider  an  (n,  k)  linear  code  used  over  a  BSC  channel  with  crossover  probability 
p.  The  number  of  errors  among  n  code  bits  is  X  ~  Bin(n,p).  A  bounded  distance  decoder  of 
radius  t  is  used  to  decode  it  (assume  that  the  code  is  capable  of  correcting  at  least  t  errors).  The 
probability  of  incorrect  decoding  is  therefore  given  by 

Pe  =  P[X  >  t)  =  J2  (  l  )  P*(l  -  p)”_t  (7.47) 

k=t+l  '  ' 

The  computation  in  (7.47)  is  straightforward,  but  for  large  n,  numerical  problems  can  arise  when 

evaluating  the  terms  in  the  sum  directly,  because  ^  ^  ^  can  take  very  large  values,  and  pk  can 

take  very  small  values.  One  approach  to  alleviate  this  problem  is  to  compute  the  binomial  pmf 
recursively. 

(a)  Show  that 

FIX  =  k}=  -  -  -  f  +  -P[X  =  k  -  1]  ,  k  =  l,...,n  (7.48) 

1  —  p  k 

(b)  Use  the  preceding,  together  with  the  initial  condition  P[X  =  0]  =  (1  —p)n,  to  write  a  Matlab 
program  to  compute  P[X  >  t\. 

Problem  7.13  Use  the  standard  array  in  Table  7.1  for  an  exact  computation  of  the  probabilities 
of  decoding  failure  and  decoding  error  for  the  (5,  2)  code,  for  bounded  distance  decoding  with 
t  —  1  over  a  BSC  with  crossover  probability  p.  Plot  these  probabilities  as  a  function  of  p  on  a 
log-log  scale. 

Hint:  Assume  that  the  all- zero  codeword  is  sent.  Find  the  number  and  weight  of  error  patterns 
resulting  in  decoding  failure  and  decoding  error,  respectively 

Problem  7.14  For  the  binomial  tail  probability  (7.47)  associated  with  the  probability  of  incor¬ 
rect  decoding,  we  are  often  interested  in  large  n  and  relatively  small  t;  for  example,  consider  the 
(1023,923)  BCH  code  in  Problem  7.11,  for  which  t  =  10.  While  recursive  computations  as  in 
Problem  7.12  are  relatively  numerically  stable,  we  are  often  interested  in  quick  approximations 
that  do  not  require  the  evaluation  of  a  large  summation  with  (n  —  t )  terms.  In  this  problem,  we 
discuss  some  simple  approximations. 

(a)  We  are  interested  in  designing  systems  to  obtain  small  values  of  Pe,  hopefully  significantly 
smaller  than  the  input  BER  p.  Argue  that  p  >  Ms  an  uninteresting  regime  from  this  point  of 
view.  What  is  the  uninteresting  regime  for  the  (1023,  923)  BCH  code? 

(b)  For  p«-,  argue  that  the  sum  in  (7.47)  is  well  approximated  by  its  first  term. 

(c)  Since  X  is  a  sum  of  n  i.i.d.  Bernoulli  random  variables,  show  that  the  CLT  can  be  used  to 
approximate  its  distribution  by  a  Gaussian:  X  ~  N(np,np(  1  —p))- 

(d)  For  the  (1023,  923)  BCH  code,  compute  a  numerical  estimate  of  the  probability  of  incorrect 
decoding  for  t  =  10  and  p  =  10~3  in  three  different  ways:  (i)  direct  computation,  (ii)  estimation 
by  the  first  term  as  in  (b),  (iii)  estimation  using  the  Gaussian  approximation. 

(e)  Repeat  (d)  for  p  =  10-4. 

(f)  Comment  on  the  match  (or  otherwise)  between  the  three  estimates  in  (d)  and  (e).  What 
happens  with  smaller  pi 

Problem  7.15  Here  are  the  (n,  k,  t)  parameters  for  some  other  binary  BCH  codes  for  which  the 
computations  of  Problem  7.12  can  be  repeated:  (1023,  863, 16),  (511, 421, 10),  (255,  215,  5). 

Problem  7.16  Reed-Solomon  (RS)  codes  are  a  widely  used  class  of  codes  on  non-binary  alpha¬ 
bets.  While  we  do  not  discuss  the  algebraic  structure  of  any  of  the  codes  we  have  mentioned,  we 
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state  in  passing  that  RS  codes  can  be  viewed  as  a  special  class  of  BCH  codes.  The  symbols  in 
an  RS  code  come  from  GF{ 2m)  (a  finite  field  with  2m  elements,  where  m  is  a  positive  integer), 
hence  each  symbol  can  be  represented  by  m  bits.  The  code  block  length  equals  n  =  2m  —  1.  The 
minimum  distance  is  given  by 

drain  =  U~k+  1  (7.49) 

This  is  actually  the  best  possible  minimum  distance  attainable  for  an  (n,  k)  code.  It  is  possible 
to  extend  the  RS  code  by  one  symbol  to  obtain  n  =  2m,  and  to  shorten  the  code  to  obtain 
n  <  2m  —  1,  all  the  while  maintaining  the  minimum  distance  relationship  (7.49).  Bounded 
distance  decoding  can  be  used  to  correct  up  to  j  =  or  dmin  —  1  =  n  —  k  erasures, 

or  any  pattern  of  t  errors  and  e  erasures  satisfying  2t  +  e  +  1  <  dmin  —  n  —  k  +  1.  One  drawback 
of  RS  codes:  it  is  not  possible  to  obtain  code  block  lengths  larger  than  2m,  the  alphabet  size. 

(a)  What  is  the  maximum  number  of  symbol  errors  that  a  (255,  235)  RS  code  can  correct?  How 
many  bits  does  each  symbol  represent?  In  the  worst  case,  how  many  bits  can  the  code  correct? 
How  about  in  the  best  case? 

(b)  The  (255,  235)  RS  code  is  used  as  an  outer  code  in  a  system  in  which  the  inner  code  produces 
a  BER  of  10~3.  What  is  the  symbol  error  probability,  assuming  that  the  bit  errors  are  i.i.d.? 
Assuming  bounded  distance  decoding  up  to  the  maximum  possible  number  of  correctable  errors, 
find  the  probability  of  incorrect  decoding. 

Note:  The  symbol  error  probability  p  =  1  —  (1  —  Pb)m,  where  pb  is  the  BER  and  m  the  number 
of  bits  per  symbol. 

(c)  What  is  the  BER  that  the  inner  code  must  produce  in  order  for  the  (255,  235)  RS  code  to 
attain  a  decoding  failure  probability  of  less  than  10-12? 

(d)  If  the  BER  of  the  inner  code  is  fixed  at  10-3  and  the  block  length  and  alphabet  size  of  the 
RS  code  are  as  in  (b)-(c),  what  is  the  value  of  k  for  which  the  decoding  failure  probability  is  less 
than  10-12? 

Remark:  While  we  consider  random  bit  errors  in  this  problem,  inner  decoders  may  often  output 
a  burst  of  errors,  and  this  is  where  outer  RS  codes  become  truly  valuable.  For  example,  a  burst  of 
errors  spanning  30  bits  corresponds  to  at  most  5  symbol  errors  in  an  RS  code  with  8-bit  symbols. 
On  the  other  hand,  correcting  up  to  30  errors  using,  say,  a  binary  BCH  code  would  cost  a  lot  in 
terms  of  redundancy. 


LLR  computations 


Problem  7.17  Consider  a  BPSK  system  with  a  typical  received  sample  given  by 

Y  =  A(-1)X  +  N  (7.50) 

where  A  >  0  is  the  amplitude,  x  G  {0, 1}  is  the  transmitted  bit,  and  N  ~  N( 0,  a2)  is  the  noise. 
Let  7r0  =  P[x  =  0]  denote  the  prior  probability  that  x  =  0. 

(a)  Show  that  the  LLR 


L(x)  =  log 


P[x 

P[x 


M 

i|  y] 


log 


nop(y  1°) 

(1  -  7T0)p(y\l) 


Conclude  that 

L(x)  L channel  (*^)  T  Lprior{xkj 

where  Lchannei (x)  =  log  and  Lprior{x)  =  log  (b)  Write  down  the  conditional  densities 
p(y\x  =  0)  and  p(y\x  =  1). 

(c)  Show  that  the  channel  LLR  Lchannei(x )  is  given  by 


L  channel  (*^) 


p(s/|0) 

p(2/ll) 
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(d)  Specify  (in  terms  of  the  parameters  A  and  er)  the  conditional  distribution  of  L channel,  condi¬ 
tioned  on  x  =  0  and  x  — 

(e)  Suppose  that  the  preceding  is  used  to  model  either  the  I  sample  or  Q  sample  of  a  Gray  coded 
QPSK  system.  Express  Es/Nq  for  the  system  in  terms  of  A  and  o. 


Answer: 


El 

No 


(f)  Suppose  that  we  use  BICM  using  a  binary  code  of  rate  Rcode  prior  to  QPSK  modulation. 
Express  Es/N$  for  the  QPSK  symbols  in  terms  of  Eb/N q. 


Answer:  jjfc  =  2Rcode§±. 


(e)  For  Eb/No  of  3  dB  and  a  rate  2/3  binary  code,  what  is  the  value  of  A  if  the  noise  variance 
per  dimension  is  scaled  to  a2  =  1. 

(g)  For  the  system  parameters  in  (f),  specify  numerical  values  for  the  parameters  governing  the 
conditional  distributions  of  the  LLR  found  in  (c). 

(h)  For  the  system  parameters  in  (f),  specify  the  probability  of  bit  error  for  hard  decisions  based 
on  Y . 


Problem  7.18  In  this  problem,  we  derive  the  tanh  rule  (7.42)  for  the  check  update,  hopefully 
in  a  ways  that  provides  some  insight  into  where  the  tanh  comes  from. 

(a)  For  any  bit  x  with  LLR  L,  we  have  observed  that  P\x  —  0]  =  yryp  Now,  show  that 

5  =  P[x  =  0]  —  ^  ^  tanh(L/2)  (7.51) 

Thus,  the  tanh  provides  a  measure  of  how  much  the  distribution  of  x  deviates  from  an  equiprob- 
ablc  distribution. 

Now,  suppose  that  x3  =  x\  ©x2,  where  X\  and  x2  are  modeled  as  independent  for  the  purpose  of 
belief  propagation.  Let  Lj  denote  the  LLR  for  aq,  and  set  P[xi  =  0]  —  |  =  Si,  i  =  1,2,3.  (Note 
that  P[xi  =  1]  =  \  —  Si.)  Under  our  model, 

P[x3  =  0]  =  P[x  i  =  0]P[x2  =  0]  +  P[xi  =  1  }P[x2  =  1] 

(b)  Plug  in  expressions  for  these  probabilities  in  terms  of  the  Si  and  simplify  to  show  that 

S3  =  2S\S2 

(c)  Use  the  result  in  (a)  to  infer  the  tanh  rule 

tanh(L3/2)  =  tanh(Li/2)  tanh(L2/2) 


-3A  -A  +A  +3A 

- • - • - • - • - 

00  10  11  01 

Figure  7.17:  Gray  coded  4PAM  constellation. 


Problem  7.19  Consider  the  Gray  coded  4PAM  constellation  depicted  in  Figure  7.17.  Denote 
the  label  for  each  constellation  point  by  X\X2,  where  aq,  x2  G  {0, 1}.  The  received  sample  is  given 
by 


Y  =  s  +  N 
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where  s  e  {— 3A,  —A,  A,  3A}  is  the  transmitted  symbol,  and  N  ~  N(0,  a2)  is  noise, 

(a)  Find  expressions  for  the  channel  LLRs  for  the  two  bits: 


L channel  (X i  )  log 


p(y\x  i  =  o) 
p(y\x  i  =  l) 


L channel  (x 2)  k)§ 


p(y\x  2  =  0) 
p(y\x  2  =  1) 


Hint:  Note  that 

p(y \xi  =  0)  =  p(y\x1x2  =  00)  +  p(y\xix2  =  01) 

(b)  Simulate  the  system  for  A  —  2,  normalizing  a  =  1,  and  choosing  the  bits  X\  and  x2  inde¬ 
pendently  and  with  equal  probability  from  {0, 1}.  Plot  the  histogram  for  LLR\  conditioned  on 
X\  —  0  and  conditioned  on  X\  —  \  on  the  same  plot.  Plot  the  histogram  for  LLR2  conditioned 
on  x2  —  0  and  conditioned  on  x2  =  1  on  the  same  plot.  Are  the  conditional  distributions  in  each 
case  well  separated? 

(c)  You  wish  to  design  a  BICM  system  with  a  binary  code  of  rate  Rcode  to  be  used  with  4PAM 
modulation  with  A  and  o  as  in  (b).  Using  the  formula  (7.7)  for  the  discrete  time  AWGN  channel 
to  estimate  the  code  rate  to  be  used,  assuming  that  you  can  operate  3  dB  from  the  Shannon 
limit. 

Hint:  Compute  the  SNR  in  terms  of  A  and  a,  but  then  reduce  by  3  dB  before  plugging  into  (7.7) 
to  find  the  bits  per  channel  use. 

(d)  Repeat  (b)  and  (c)  for  A  =  1,  a  =  1. 


Software  Lab  7.1:  Belief  propagation 

Lab  Objectives:  The  purpose  of  this  lab  is  to  provide  hands-on  experience  with  belief  propa¬ 
gation  (BP)  for  decoding.  As  a  warm-exercise,  we  first  apply  BP  to  our  running  example  (5,  2) 
code,  for  which  we  can  compare  its  performance  against  bounded  distance  decoding.  We  then 
introduce  array  codes,  a  class  of  LDPC  codes  with  a  simple  deterministic  construction  which 
can  provide  excellent  performance;  while  the  performance  of  the  array  codes  we  consider  here 
is  inferior  to  that  of  the  best  available  LDPC  codes,  the  gap  can  be  narrowed  considerably  by 
tweaking  them  (discussion  of  such  modifications  is  beyond  our  scope). 

Reading:  Sections  7.4  (linear  codes)  and  7.5  (belief  propagation). 

Lab  Assignment 

1)  Write  a  function  implementing  belief  propagation.  The  inputs  are  the  parity  check  matrix,  the 
channel  LLRs,  and  the  maximum  number  of  iterations.  The  outputs  are  a  binary  vector  which 
is  an  estimate  of  the  transmitted  codeword,  a  bit  indicating  of  whether  this  binary  vector  is  a 
valid  codeword,  and  the  number  of  iterations  actually  taken.  To  be  concrete,  we  start  defining 
the  function  below. 

function  [xhat , valid_codeword, iter]  =  belief _propagation(H,Lchannel ,max_iter) 

11  INPUTS 

°/„H  =  parity  check  matrix 

°/„Lchannel  =  LLRs  obtained  from  channel 

%max_iter  =  maximum  allowed  number  of  iterations 

%%0UTPUTS 

%xhat=b inary  vector (estimate  of  transmitted  codeword) 

%valid_codeword  =  1  if  x  is  a  codeword 
%iter  =  number  of  iterations  taken  to  decode 

%%NEED  TO  FILL  IN  THE  FUNCTION  NOW 


394 


One  possible  approach  to  filling  in  the  function  is  to  take  the  following  steps: 

(a)  Build  the  Tanner  graph:  Given  the  parity  check  matrix  H,  find  and  store  the  neighbors  for 
each  variable  node  and  each  check  node.  This  can  be  done  using  a  cell  array,  as  follows. 

/(determine  number  of  nodes  on  each  side  of  the  Tanner  graph 
[number_check_nodes ,n]  =  size(H); 

%Store  indices  of  edges  from  variable  to  check  nodes 
variables_edges_index  =  cell(n,l); 
for  j=l:n 

variables_edges_index{j}  =  find(H( : , j)==l) ; 

end 

/(Store  indices  of  edges  from  check  to  variable  nodes 
check_node_edges_index  =  cell(number_check_nodes , 1) ; 
for  i=l : number_check_nodes 

check_node_edges_index{i}  =  f  ind(H(i ,  :  )==1)  ’ ; 

end 

(b)  Build  the  message  data  structure:  We  can  maintain  messages  (LLRs)  in  a  matrix  of  the 
same  dimension  as  H,  with  nonzero  entries  only  where  H  is  nonzero.  The  jth  variable  node 
will  read/write  its  messages  from/to  the  jth  column,  while  the  ith  check  node  will  read/write 
its  messages  from/to  the  ?'th  row.  Initialize  messages  from  variable  nodes  to  the  channel  LLRs, 
and  from  check  nodes  to  zeros.  We  maintain  two  matrices,  one  corresponding  to  messages  from 
variable  nodes,  and  one  corresponding  to  messages  from  check  nodes. 

°/0messages  from  variable  nodes 

Lout_variables  =  H . *repmat (Lchannel ’ ,number_check_nodes , 1) ; 

/(messages  from  check  nodes 
Lout_check_nodes  =  zeros (size (H) ) ; 

(c)  Implement  message  passing:  We  can  now  use  the  variable  update  and  check  update  func¬ 
tions  (code  fragments  7.5.1  and  7.5.2  respectively),  along  with  the  preceding  data  structure,  to 
implement  message  passing. 

/(initialize  message  passing 

valid_codeword  =  0;  /(indicates  valid  codeword  found 
iter  =  0; 

while (iter<max_iter  &&  ~valid_codeword) 

°/0loop  over  check  nodes  to  generate  messages 
for  i=l :number_check_nodes 

Lout_check_nodes  (i ,  check_node_edges_index-[i}-)  =  check_update  (Lout_variables  (i ,  check_node 

end 

%loop  over  variable  nodes 
for  j=l:n 

Lout_variables(variables_edges_index{j}, j)  =  variable_update(Lchannel(j) ,Lout_check_node 

end 

°/0check  for  valid  codeword 

bhat  =  sign(sum(Lout_check_nodes) ’  +  Lchannel);  /(hard  decisions  +1,-1 
x  =  (l-bhat)/2;  °/0convert  hard  decisions  from  {+1,-1}-  to  {0,1} 
if (mod(H*x, 2)==zeros (number_check_nodes , 1) ) 
valid_codeword  =  1; 

end 

iter  =  iter  +1; 
end 
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Putting  (a)-(c)  together  gives  the  desired  function. 

2)  Write  a  program  to  check  that  the  preceding  belief  propagation  function  works  for  our  example 
(5,  2)  code,  using  the  parity  check  matrix  corresponding  to  the  Tanner  graph  in  Figure  7.13,  and 
generating  the  channel  LLRs  as  described  in  Problem  7.17.  Specifically,  consider  Gray  coded 
QPSK  modulation,  where  the  I  and  Q  components  follow  the  BPSK  model  in  Problem  7.17. 
Note  that  A  and  a  in  Problem  7.17  must  be  chosen  appropriately  (fix  one,  say  a  —  1,  and  scale 
the  other)  based  on  the  spectral  efficiency  r  (r  =  4/5  for  QPSK  with  the  (5,  2)  code)  and  Ej,/N0. 
Assume,  without  loss  of  generality,  that  the  all- zero  codeword  is  sent.  Decoding  error  therefore 
occurs  when  the  belief  propagation  function  returns  a  nonzero  codeword,  or  reports  that  a  valid 
codeword  was  not  found  after  the  maximum  allowed  number  of  iterations. 

3)  Use  simulations  to  estimate  and  plot  the  probability  of  decoding  error  (log  scale)  with  BP 
as  a  function  of  E^/Nq  (dB).  On  the  same  graph,  also  plot  the  probability  of  decoding  error  for 
bounded  distance  decoding  with  hard  decisions  (this  can  be  computed  analytically,  as  described 
in  Problem  7.12),  and  the  probability  of  error  for  uncoded  QPSK.  Comment  on  the  results. 
Does  BP  with  soft  decisions  provide  an  improvement  over  bounded  distance  decoding?  Is  the 
performance  better  than  that  of  uncoded  QPSK?  For  your  reference,  an  example  unlabeled  plot 
is  provided  in  Figure  7.18.  Guess  the  labels  for  the  three  plots  before  verifying  them  using  your 
own  computations  and  simulations. 


Figure  7.18:  Performance  of  the  (5,  2)  code  with  QPSK  modulation,  comparing  belief  propagation 
with  soft  decisions  against  bounded  distance  decoding  with  hard  decisions.  Also  plotted  for 
comparison  is  the  performance  of  uncoded  QPSK.  Which  curve  is  which? 


Array  codes:  We  now  introduce  the  class  of  array  codes,  whose  parity  check  matrix  is  charac¬ 
terized  by  three  positive  integers  (p,  J,  L ),  and  is  of  the  following  form: 
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(7.52) 


where  I  denotes  a  p  x  p  identity  matrix,  and  p  is  a  prime  number.  The  matrix  P  is  obtained  by 
cyclically  shifting  the  rows  of  I  by  one.  Thus,  for  p  =  3,  we  have 
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The  matrix  P  is  a  permutation  matrix,  in  the  sense  that  for  any  px  1  vector  u  =  (tq, ... ,up)T , 
the  vector  Pu  is  a  permutation  of  u.  For  this  choice  of  P,  the  vector  Pu  =  (up,ui,  ...,up- i)T 
is  a  cyclic  shift  of  u  by  one.  Raising  P  to  an  integer  power  k  simply  corresponds  to  applying  k 
successive  cyclic  shifts,  so  that  Pk  is  a  cyclic  shift  of  the  rows  of  I  by  k,  and  Pfcu  is  a  cyclic  shift 
of  u  by  k. 

The  parity  check  matrix  H  in  (7.52)  consists  of  JL  p  x  p  blocks,  with  the  ( j ,  /)th  block  being 
pO'-1)(i-1)j  l<j<J,l<l<L.  The  code  length  n  =  pL,  the  number  of  columns  (or  variable 
nodes).  The  column  weight  equals  J  for  each  column  (make  sure  you  check  this);  that  is,  each 

variable  node  has  degree  J.  The  number  of  rows  (or  check  nodes)  equals  pJ,  but  some  of  these 

rows  may  be  redundant,  so  the  dimension  of  the  dual  code  n  —  k  <  pJ.  In  fact,  it  can  be  shown 
that  exactly  J  —  1  rows  are  redundant.  (To  see  why  this  might  be  true,  add  the  first  p  rows,  and 
then  the  next  p  rows.  What  answers  do  you  get?  What  does  this  tell  you  about  the  number  of 
linearly  independent  rows?)  Thus,  the  rank  of  H  equals  n  —  k  =  pJ  —  ( J  —  1),  so  that  the  code 
dimension  k  =  p(L  —  J)  +  J  —  1.  We  therefore  summarize  as  follows: 

n  =  pL  ,  k  =  p(L  —  J)  +  J  —  1  for  a  (p,  J,  L)  array  code  (7.53) 

Popular  choices  of  the  variable  node  degree  J  =  3,4.  Analysis  of  code  properties  show  that  we 

should  restrict  L  <  p.  We  can,  for  example,  use  a  large  prime  p  and  moderate  sized  L,  or  set 
L  =  p  for  a  relatively  small  value  of  p. 

4)  Write  a  function  to  generate  the  parity  check  matrix  of  an  array  code,  whose  inputs  are  p,  J,  L 
and  outputs  are  H  ,n,k. 


function  [H,n,k]  =  array_code(p, J,L) 

/(Generates  the  parity  check  matrix  for  an  array  code 

11  INPUTS 

°/„p  is  a  prime 

%J=  check  node  degree  (column  weight) ,  usually  set  to  3  or  4 

%L  =  parameter  <=  p  that  determines  code  length 

'/.’/.OUTPUTS 

°/„H  =  parity  check  matrix 
%%n  =  pL  (code  length) 

°/o7ok  =  p(L-J)+J-l  (number  of  info  bits) 

°/„p  times  p  identity  matrix 
Iblock  =  eye(p); 

1  can  use  Matlab’s  circshift  operation  on  Iblock  to  generate  P  and  its  powers 
%for  example,  circshift (Iblock, [0  (j-1) * (1-1)] )  generates  the  (j,l)th  block  of  H 
%%N0W  FILL  IN  THE  FUNCTION ! %%% 

5)  Consider  an  array  code  with  p  —  11,  with  L  =  p  and  J  —  4,  used  as  before  with  Gray  coded 
QPSK  and  BICM.  As  before,  use  simulations  to  estimate  and  plot  the  probability  of  decoding 
error  (log  scale)  with  BP  as  a  function  of  Eh /N0  (dB)  for  a  BICM  system  employing  QPSK. 
Compare  the  performance  {E^/Nq  for  decoding  error  probability  of  ICC4)  with  the  Shannon 
limit  for  that  spectral  efficiency.  To  limit  the  simulation  cost,  you  may  wish  to  use  a  relatively 
small  number  of  simulation  runs  to  generate  your  plots,  and  to  estimate  the  value  of  Eb/N0  the 
probability  of  decoding  error  starts  falling  below,  say,  10~2,  and  then  use  a  larger  number  of  runs 
for  a  few  carefully  chosen  values  of  E^/Nq  to  see  when  the  decoding  error  probability  hits  10~4. 
How  does  this  E^/Nq  compare  with  that  required  for  ICC4  BER  with  uncoded  QPSK? 

6)  Repeat  5)  for  larger  values  of  the  prime  number  p  (still  keeping  L  =  p  and  J  =  4),  within  the 
limits  of  your  computational  infrastructure.  For  example,  try  p  =  47. 
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7)  Repeat  5)  with  large  p  and  relatively  small  L ;  for  example,  p  =  911  and  L  —  8,  still  keeping 
J  =  4.  How  does  the  code  rate  and  spectral  efficiency  (with  QPSK)  compare  with  5)  and  6)? 

Lab  Report:  Your  lab  report  should  answer  the  preceding  questions  in  order,  and  should  document 
the  reasoning  you  used  and  the  difficulties  you  encountered.  Comment  on  the  decoding  error 
probability  trends  as  you  vary  the  code  parameters. 
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Chapter  8 

Dispersive  Channels  and  MIMO 


From  the  material  in  Chapters  4-6,  we  now  have  an  understanding  of  commonly  used  modulation 
formats,  noise  models,  and  optimum  demodulation  for  the  AWGN  channel  model.  Chapter  7 
discusses  channel  coding  strategies  for  these  idealized  models.  In  this  final  chapter,  we  discuss 
more  sophisticated  channel  models,  and  the  corresponding  signal  processing  schemes  required  at 
the  demodulator. 

We  first  consider  the  following  basic  model  for  a  dispersive  channel:  the  transmitted  signal  passes 
through  a  linear  time-invariant  system,  and  is  then  corrupted  by  white  Gaussian  noise.  The  LTI 
model  is  broadly  applicable  to  wireline  channels,  including  copper  wires,  cable  and  fiber  optic 
communication  (at  least  over  shorter  distances,  over  which  fiber  nonlinearities  can  be  neglected), 
as  well  as  to  wireless  channels  with  quasi-stationary  transmitters  and  receivers.  For  wireless 
mobile  channels,  the  LTI  model  is  a  good  approximation  over  durations  that  are  small  compared 
to  the  time  constants  of  mobility,  but  still  fairly  long  on  an  electronic  timescale  (e.g.,  of  the  order 
of  milliseconds).  Methods  for  compensating  for  the  effects  of  a  dispersive  channel  are  generically 
termed  equalization.  We  introduce  two  common  design  approaches  for  this  purpose. 

The  first  approach  is  singlecarrier  modulation,  which  refers  to  the  linear  modulation  schemes 
discussed  in  Chapter  4,  where  the  symbol  sequence  modulates  a  transmit  pulse  occupying  the 
entire  available  bandwidth.  We  discuss  linear  zero  forcing  (ZF)  and  Minimum  Mean  Squared 
Error  (MMSE)  equalization  techniques,  which  are  suboptimal  from  the  point  of  view  of  mini¬ 
mizing  error  probability,  but  are  intuitively  appealing  and  less  computationally  complex  than 
optimum  equalization.  (We  refer  the  reader  to  more  advanced  texts  for  discussion  of  optimum 
equalization  and  its  performance  analysis.)  We  discuss  adaptive  implementation  and  geometric 
interpretation  for  linear  equalizers. 

The  second  approach  to  channel  dispersion  is  Orthogonal  Frequency  Division  Multiplexing  (OFDM), 
where  linear  modulation  is  applied  in  parallel  to  a  number  of  subcarriers,  each  of  which  occupies 
a  bandwidth  which  is  small  compared  to  the  overall  bandwidth.  OFDM  may  be  viewed  as  a 
mechanism  for  ISI  avoidance.  It  is  based  on  the  observation  that  any  complex  exponential  e ■?27rf°i; 
passes  through  any  LTI  system  with  transfer  function  H(f )  unchanged  except  for  multiplication 
by  H(fo).  Thus,  we  can  send  a  number  of  complex  exponentials  {e-727r^i},  termed  subcarriers, 
in  parallel  through  the  channel,  each  multiplied  by  an  information-bearing  symbol,  such  that 
interference  across  subcarriers  is  avoided.  The  task  of  channel  equalization  therefore  reduces  to 
compensating  separately  for  the  channel  gains  H(fi)  for  each  such  subcarrier.  Parallelizing  the 
problem  of  equalization  in  this  manner  is  particularly  attractive  when  the  underlying  time  domain 
impulse  response  h(t)  is  complicated  (e.g.,  an  indoor  wireless  channel  where  there  are  a  large 
number  of  paths  with  multiple  bounces  off  walls  and  ceilings  between  transmitter  and  receiver). 
We  discuss  how  this  intuition  is  translated  into  practice  using  transceiver  implementations  using 
digital  signal  processing  (DSP). 

Finally,  we  discuss  multiple  antenna  communication,  also  popularly  known  as  Multiple  Input 
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Multiple  Output  (MIMO),  or  space-time,  communication.  There  is  a  great  deal  of  commonality 
between  signal  processing  for  dispersive  channels  and  for  MIMO,  which  is  why  we  treat  these 
topics  within  the  same  chapter.  Furthermore,  the  combination  of  OFDM  with  MIMO  allows 
parallelization  of  transceiver  signal  processing  for  complicated  channels,  and  has  become  the 
architecture  of  choice  for  both  WiFi  (the  IEEE  802. lln  standard)  and  for  fourth  generation 
cellular  systems  (LTE,  or  long  term  evolution).  Three  key  concepts  for  MIMO  are  covered: 
beamforming  (directing  energy  towards  a  desired  communication  partner),  diversity  (combating 
fading  by  using  multiple  paths  from  transmitter  to  receiver),  and  spatial  multiplexing  (using 
multiple  antennas  to  support  parallel  data  streams). 

Chapter  Plan:  Compared  to  the  earlier  chapters,  this  chapter  has  a  somewhat  unusual  orga¬ 
nization.  For  dispersive  channels,  a  key  goal  is  to  provide  hands-on  exposure  via  software  labs. 
A  model  for  singlecarrier  linear  modulation  over  a  dispersive  channel,  including  code  fragments 
for  modeling  the  transmitter  and  the  channel,  is  presented  in  Section  8.1.  Linear  equalization 
is  discussed  in  Section  8.2.  Sections  8.1  and  8.2.1  provide  just  enough  background,  including 
code  fragments,  for  Software  Lab  8.1  on  adaptive  implementation  of  linear  equalization.  Section 
8.2.2  provides  geometric  insight  into  why  the  implementation  in  Software  Lab  8.1  works,  and 
provides  a  framework  for  analytical  computations  related  to  MMSE  equalization  and  the  closely 
related  notion  of  zero- forcing  (ZF)  equalization.  It  is  not  required  for  actually  doing  Software 
Lab  8.1.  The  key  concepts  behind  OFDM  and  its  DSP-centric  implementation  are  discussed  in 
Section  8.3,  whose  entire  focus  is  to  provide  background  for  developing  a  simplified  simulation 
model  for  an  OFDM  link  in  Software  Lab  8.2.  Finally,  MIMO  is  discussed  in  Section  8.4,  with 
the  signal  processing  concepts  for  MIMO  communication  reinforced  by  Software  Lab  8.3.  The 
problems  at  the  end  of  this  chapter  focus  on  linear  equalization  concepts  discussed  in  Section 
8.2.2,  and  on  performance  evaluation  of  core  MIMO  techniques  (beamsteering,  diversity  and 
spatial  multiplexing)  discussed  in  Section  8.4. 

Software:  As  already  mentioned,  this  chapter  is  structured  to  give  an  exposure  to  advanced 
concepts  through  the  associated  software  labs:  Software  Lab  8.1  for  singlecarrier  modulation 
over  dispersive  channels,  Software  Lab  8.2  for  OFDM,  and  Software  Lab  8.3  for  MIMO  signal 
processing. 


8.1  Singlecarrier  System  Model 

We  first  provide  a  system-level  overview  of  singlecarrier  linear  modulation  over  a  dispersive 
channel.  Figure  8.1  shows  block  diagrams  corresponding  to  a  typical  DSP-centric  realization  of 
the  transceiver.  The  DSP  operations  are  performed  on  digital  streams  at  an  integer  multiple 
of  the  symbol  rate,  denoted  by  m/T.  For  example,  we  might  choose  m  =  4  for  implementing 
the  transmit  and  receive  Liters,  but  we  might  subsample  the  output  of  the  receive  filter  down 
to  m  =  2  before  implementing  an  equalizer.  We  model  the  core  components  of  such  a  system 
using  the  complex  baseband  representation,  as  shown  in  Figure  8.2.  Given  the  equivalence  of 
passband  and  complex  baseband,  we  are  only  skipping  modeling  of  finite  precision  effects  due 
to  digital-to-analog  conversion  (DAC)  and  analog-to-digital  conversion  (ADC).  These  effects  can 
easily  be  incorporated  into  models  such  as  those  we  develop,  but  are  beyond  our  current  scope. 

We  focus  on  a  hands-on  development  of  the  key  ideas  using  discrete  time  simulation  models, 
illustrated  by  code  fragments. 


8.1.1  Signal  Model 

We  begin  with  an  example  of  linear  modulation,  to  see  how  ISI  arises  and  can  be  modeled. 
Consider  linear  modulation  using  BPSK  with  a  sine  pulse,  which  leads  to  a  transmitted  baseband 
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Figure  8.1:  Typical  DSP-centric  transceiver  realization.  Our  model  does  not  include  the  blocks 
shown  in  dashed  lines.  Finite  precision  effects  due  to  digital  to  analog  conversion  (DAC)  and 
analog  to  digital  conversion  (ADC)  are  not  considered.  The  upconversion  and  downconversion 
operations  are  not  modeled.  The  passband  channel  is  modeled  as  an  LTI  system  in  complex 
baseband. 
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Figure  8.2:  Block  diagram  of  a  linearly  modulated  system,  modeled  in  complex  baseband. 
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waveform  shown  in  Figure  8.3(a).  The  Matlab  code  used  for  generating  this  plot  is  given  below. 
We  have  sampled  much  faster  than  the  symbol  rate  (at  32/T)  in  order  to  obtain  a  smooth  plot. 
In  practice,  we  would  typically  sample  at  a  smaller  multiple  of  the  symbol  rate  (e.g.  at  4/T)  to 
generate  the  input  to  the  DAC  in  Figure  8.1. 


(a)  Transmit  filter  output.  (b)  Receive  filter  output. 


Figure  8.3:  The  outputs  of  the  transmit  and  receive  filter  without  channel  dispersion.  The 
symbols  can  be  read  off  from  sampling  each  waveform  at  the  times  indicated  by  the  stem  plot. 

We  provide  Matlab  code  fragments  that  convey  the  concepts  underlying  discrete-time  modeling 
and  implementation.  The  code  fragments  also  show  how  some  of  the  plots  here  are  generated, 
with  cosmetic  touches  omitted. 

The  following  code  fragment  shows  how  to  work  with  discrete  time  samples  using  oversampling 
at  rate  m/T,  including  how  to  generate  the  plot  of  the  transmitted  waveform  in  Figure  8.3(a). 

Code  Fragment  8.1.1  (Transmitted  waveform) 

°/„choose  large  oversampling  factor  for  smooth  plots 
oversampling_f actor  =  32; 
m  =  oversampling_f actor ;  °/„for  brevity 
°/„generate  sine  pulse 

time_over_symbol  =  cumsum(ones (m, 1) )-l ; 
transmit_f ilter  =  sin(time_over_symbol*pi/m) ; 
y„number  of  symbols 
nsymbols  =  10; 

%BPSK  symbol  generation 

symbols  =  sign (rand (nsymbols , 1)  -.5); 

/(express  symbol  sequence  at  oversampled  rate  using  zeropadding, 

°/„ (starts  and  ends  with  nonzero  symbols) 

Lpadded  =  m*  (nsymbols  -1)+1;  %°/„length  of  zeropadded  sequence 
symbolspadded  =  zeros  (Lpadded,  1) ;  "///(initialize 

symbolspadded  (1  :m:Lpadded)  =  symbols;  %°/„fill  in  bit  values  every  m  entries 
°/„%now  all  convolutions  can  be  performed  in  oversampled  domain 
transmit_output  =  conv (symbolspadded, transmit_f ilter) ; 

"/plot  transmitted  waveform  and  sampling  times 
tl  =  (cumsum(ones(length(transmit_output)))-l)/m; 
figure ; 

plot (t 1 , transmit_output , ’ b 5 ) ; 
xlabelCt/T’)  ; 
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hold  on; 

/choose  sampling  times  in  accordance  with  peak  of  transmit  filter  response 

[maxval  maxloc]  =  max(transmit_f ilter) ;  /find  peak  location 

sampling_times  =  maxloc :m: (nsymbols-1) *m+maxloc ; 

sampled_outputs  =  transmit_output (sampling_times) ; 

stem( (sampling_times-l)/m, sampled_outputs , ’ r ’ ) 

hold  off ; 

If  this  waveform  now  goes  through  an  ideal  channel,  and  we  use  a  receive  filter  with  impulse 
response  matched  to  the  transmitted  pulse,  then  the  waveform  we  obtain  is  shown  in  Figure 
8.3(b).  The  transmit  filter  impulse  response  is  time  limited  to  length  T  and  hence  square  root 
Nyquist  (see  Chapter  4),  hence  the  net  response  to  a  single  symbol,  which  is  a  cascade  of  the 
transmit  filter  with  its  matched  filter,  is  Nyquist.  It  follows  that,  by  sampling  at  the  right 
moments  (as  marked  on  the  plot),  we  can  recover  the  symbols  exactly. 

We  now  provide  a  code  fragment  to  model  the  channel  and  receive  filter;  it  can  be  employed  for 
modeling  both  ideal  and  dispersive  channels.  Appending  it  to  code  fragment  8.1.1  generates  and 
plots  the  noiseless  received  waveform. 

Code  Fragment  8.1.2  (Modeling  the  channel  and  receive  filter) 

dispersive  =  0;  /set  this  to  0  for  ideal  channel,  and  to  1  for  dispersive  channel 
if  dispersive  ==  0, 
channel  =  1 ; 
else 

channel  =  [0 . 8 ; zeros (m/2 , 1) ; -0 . 7 ; zeros (m, 1) ; -0 . 6] ; 

/(or  substitute  your  favorite  choice  of  dispersive  channel) 
end 

/noiseless  receiver  input 

receive_input  =  conv(transmit_output , channel) ; 
t2  =  (cumsum(ones(length(receive_input)))-l)/m; 
figure ; 

plot(t2,receive_input) ; 
xlabel( ’t/T’ ) ; 

°/0receive  filter  matched  to  transmit  filter 
% (would  also  need  to  conjugate  if  complex-valued) 
receive_f ilter  =  f lipud(transmit_f ilter) ; 

/receive  filter  output  (normalized  to  account  for  oversampling) 
receive_output  =  (1/m) *conv(receive_input ,receive_f ilter) ; 
t3  =  (cumsum(ones(length(receive_output) , l))-l)/m; 

°/0plot  receive  filter  output  together  with  sample  locations  chosen  based  on  peak  of  net  respo 
figure ; 

plot (t3 , receive_output , ’ b ’ ) ; 
xlabel( ’t/T’ ) ; 
hold  on; 

"/effective  pulse  at  channel  output 
pulse  =  conv(transmit_f ilter, channel) ; 

/effective  pulse  at  receive  filter  output  (normalized  to  account  for  oversampling) 
rx_pulse  =  conv(pulse ,receive_f ilter) /m; 

[maxval  maxloc]  =  max(rx_pulse) ; 

rx_sampling_times  =  maxloc:m: (nsymbols-1) *m+maxloc ; 
rx_sampled_outputs  =  receive_output (rx_sampling_times) ; 
stem( (rx_sampling_times-l) /m,rx_sampled_outputs , ’r ’ ) ; 
hold  off; 
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(a)  Dispersive  channel.  (b)  Receive  filter  output. 

Figure  8.4:  When  the  transmitted  waveform  passes  through  the  dispersive  channel  shown,  we 
can  no  longer  read  off  the  symbols  reliably  by  sampling  the  output  of  the  receive  filter.  For  this 
particular  set  of  symbols,  one  of  the  symbols  is  estimated  incorrectly,  even  though  there  is  no 
noise. 

Figure  8.4  shows  a  dispersive  channel  and  the  corresponding  noiseless  receive  filter  output.  The 
effective  pulse  given  by  the  cascade  of  the  transmit,  channel  and  receive  filters  is  no  longer 
Nyquist,  hence  we  do  not  expect  a  symbol  decision  based  on  a  single  sample  to  be  reliable. 
Figure  8.4(b)  shows  the  severe  distortion  due  to  ISI  with  a  “best  effort”  choice  of  sampling  times 
(chosen  based  on  the  peak  of  the  effective  pulse).  In  particular,  for  the  specific  symbol  sequence 
shown,  one  (out  of  ten)  of  the  symbol  estimates  obtained  by  taking  the  signs  of  these  samples  is 
incorrect. 


Figure  8.5:  Eye  diagrams  with  and  without  channel  dispersion.  The  eye  is  closed  for  the  channel 
considered,  which  means  that  reliable  symbol  decisions  are  not  possible  without  equalization. 


Eye  diagrams:  A  classical  technique  for  visualizing  the  effect  of  ISI  is  the  eye  diagram.  It 
is  constructed  by  overlapping  multiple  segments  of  the  received  waveform  over  a  fixed  window, 
which  tells  us  how  different  combinations  of  symbols  could  potentially  create  ISI.  For  an  ideal 
channel  and  square  root  Nyquist  pulses  at  either  end,  the  eye  is  open,  as  shown  in  Figure  8.5(a). 
However,  for  the  dispersive  channel  in  Figure  8.4(b),  we  see  from  Figure  8.5(b)  is  closed.  An  open 
eye  implies  that,  by  an  appropriate  choice  of  sampling  times,  we  can  make  reliable  single-sample 
symbol  decisions,  while  a  closed  eye  means  that  more  sophisticated  equalization  techniques  are 
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needed  for  symbol  recovery. 

Physically,  an  eye  diagram  can  be  generated  using  an  oscilloscope  with  the  baseband  modulated 
signal  as  the  vertical  input,  with  horizontal  sweep  triggered  at  the  symbol  rate.  A  code  fragment 
for  generating  the  eye  pattern  from  discrete  time  samples  at  rate  m/T  is  given  below.  (While 
Matlab  has  its  own  eye  diagram  routine,  this  code  fragment  is  provided  in  order  to  clearly  convey 
the  concept.)  The  output  of  the  receive  filter  generated  in  code  fragment  8.1.2  is  the  input  to 
this  fragment,  but  in  general,  we  could  plot  an  eye  diagram  based  on  the  baseband  waveform  at 
any  stage  in  the  system.  For  complex  baseband  signals,  we  would  plot  the  eye  diagrams  for  the 
I  and  Q  components  separately. 

Code  Fragment  8.1.3  (Eye  diagram) 

'/.remove  edge  effects  before  doing  eye  diagram 
rl  =  receive_out (m/2 :m/2+nsymbols*m-l) ; 

'/.horizontal  display  length  in  number  of  symbol  intervals 
K=2 ; 

'/.break  into  non-overlapping  traces 
Rl=reshape(rl ,K*m, length (rl)/ (K*m)) ; 

'/.now  enforce  continuity  across  traces 

'/.(append  to  each  trace  the  first  element  of  the  next  trace) 
rowl  =  Rl (1 , : ) ; 

L=length(rowl) ; 
row_pruned  =  rowl(2:L); 

R_pruned  =  R1(:,1:L-1); 

R2  =  [R_pruned; row_pruned]  ; 

time  =  (0:K*m)/m;  '/.time  as  a  multiple  of  symbol  interval 
plot (time ,R2) ; 
xlabel( ’t/T’ ) ; 

8.1.2  Noise  Model  and  SNR 

In  continuous  time,  our  model  for  the  noisy  input  to  the  receive  filter  is 

y(t)  =  £  b[n\p(t  —  nT)  +  n(t)  (8.1) 

n 

where  p[t)  =  (grx  *  9c)  (t)  is  the  “effective  pulse”  given  by  the  cascade  of  the  transmit  pulse 
and  the  channel  filter,  {6[n]}  is  the  symbol  sequence,  which  is  in  general  complex- valued,  and 
n(t)  is  complex  WGN  with  PSD  cr2  =  We  translate  this  model  directly  into  discrete  time 
by  constraining  t  =  kT/m  +  r,  where  m/T  is  the  sampling  rate  (m  a  positive  integer)  and  r 
equals  the  sampling  offset.  The  noise  at  the  input  to  the  receive  filter  is  now  modeled  as  discrete 
time  white  Gaussian  noise  (WGN)  with  variance  a2  =  per  dimension.  As  we  well  know  from 
Chapter  6,  the  absolute  value  of  the  noise  variance  is  meaningless  unless  we  also  specify  the  signal 
scaling,  hence  we  fix  either  the  signal  or  noise  strength,  and  set  the  other  based  on  SNR  measures 
such  as  Eb/N0  or  Es/N0.  Here  Es  =  E[|fc[n]|2  ||p||2  for  the  model  (8.1),  and  Eb  =  Es/\og2M 
as  usual,  where  M  is  the  constellation  size.  Inner  products  and  norms  are  computed  in  discrete 
time. 

Note  that,  with  the  preceding  convention,  the  noise  energy  in  a  fixed  time  interval  scales  up  with 
the  sampling  rate,  and  so  does  the  signal  energy  (since  we  have  more  samples  whose  energies 
we  are  adding  up),  with  the  SNR  converging  to  the  continuous-time  SNR  as  the  sampling  rate 
gets  large.  However,  for  a  sampling  rate  that  is  a  small  multiple  of  the  symbol  rate,  the  SNR  for 
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the  discrete  time  system  can,  in  general,  be  different  from  that  in  the  original  continuous  time 
system.  We  do  not  worry  about  this  distinction  here.  The  following  code  fragment  illustrates 
adding  noise  to  our  simulation  model. 

We  now  provide  a  code  fragment  which  adds  discrete  time  WGN  to  the  receive  filter  input, 
resulting  in  colored  noise  at  the  output.  We  add  this  to  the  signal  component  already  computed 
in  code  fragment  8.1.2. 


Code  Fragment  8.1.4  (Noise  modeling) 

bn_energy  =  1;%  for  BPSK  with  current  normalization 

Es  =  bn_energy* (pulse ’ *pulse) ;  %pulse  is  cascade  of  transmit  and  channel  filters 
constellation_size=2;  °/„for  BPSK 
Eb  =  Es/log2(constellation_size) ; 

°/„specify  Eb/NO  in  dB 
ebnodb=5; 

ebnoraw  =  10~ (ebnodb/10) ;  %raw  Eb/NO 
N0=Eb/ ebnoraw ; 

°/„noise  standard  deviation  per  dimension 
sigma  =  sqrt(N0/2); 

°/„noise  at  input  to  receive  filter 

noise_receive_input  =  sigma*randn(size (receive_input) ) ; 

% (would  also  need  to  add  an  imaginary  component  for  complex-valued  signals) 
7„noise_receive_input  =  noise_receive_input  +  li*sigma*randn(size(receive_input) ) ; 
°/„noise  at  output  of  receive  filter 

noise_receive_output  =  (l/m)*conv(noise_receive_input,receive_filter) ; 

°/„noisy  receive  filter  output 

receive_output_noisy  =  receive_output  +  noise_receive_output ; 


8.2  Linear  equalization 

We  have  seen  that  single-sample  symbol  decisions  are  unreliable  when  the  eye  is  closed.  However, 
what  if  we  are  willing  to  use  multiple  samples  for  each  symbol  decision?  Typically,  the  transmitter 
and  receiver  may  implement  fixed  filters  in  DSP  at  a  faster  sampling  rate  than  the  sampling  rate 
used  eventually  for  equalization.  Thus,  suppose  that  we  have  samples  at  rate  m/T  from  the 
output  of  the  receive  filter,  but  we  now  wish  to  use  rate  q/T  samples  for  equalization,  where 
q  divides  m.  For  example,  we  may  have  m  =  4  and  q  =  2.  We  subsample  the  output  of  the 
receive  filter,  taking  one  out  of  every  m/q  samples,  and  then  use  L  consecutive  samples,  collected 
into  a  vector  r[n],  to  make  a  decision  on  symbol  b[n\.  We  would  want  to  choose  these  samples 
so  that  the  bulk  of  the  response  due  to  b[n\  falls  within  the  observation  interval  over  which 
we  collect  these  samples.  When  we  want  to  make  a  decision  on  the  next  symbol  b[n  +  1],  we 
must  slide  this  observation  interval  over  by  T  in  order  to  obtain  the  received  vector  r  [n  +  1]. 
Since  our  sampling  rate  is  now  q/T,  this  corresponds  to  an  offset  of  q  samples  between  successive 
observation  intervals.  Note  that  an  observation  interval  typically  spans  multiple  symbol  intervals, 
so  that  successive  observation  intervals  overlap  Figures  8.6  and  8.7  illustrate  this  concept  for  a 
channel  of  length  L  =  4  obtained  by  sampling  at  rate  2/T,  so  that  q  —  2.  The  overlap  between 
successive  observation  intervals  equals  L  —  q  =  2  samples. 

The  transmit,  channel  and  receive  filters  are  LTI  systems  and  the  noise  is  stationary,  and  succes¬ 
sive  symbols  are  input  to  the  system  spaced  by  time  T .  Since  the  discrete  time  symbol  sequence 
is  stationary  as  well,  the  statistics  of  the  signal  at  any  stage  of  the  system  are  invariant  to  shifts 
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Figure  8.6:  The  observation  interval  used  to  make  a  decision  on  6[0]  sees  contributions  from  the 
desired  symbol  6[0]  and  interfering  symbols  6[ — 1]  and  &[!]. 
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Figure  8.7:  The  observation  interval  used  to  make  a  decision  on  6[1]  sees  contributions  from  the 
desired  symbol  6[1]  and  interfering  symbols  6[0]  and  6 [2].  Comparing  with  Figure  8.6,  the  roles 
of  the  symbols  has  shifted  by  one. 
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by  integer  multiples  of  T .  Such  periodicity  in  the  statistics  is  termed  cyclostationarity.  This 
implies  that  the  statistics  of  the  noise  and  ISI  seen  in  different  observation  intervals  are  identical: 
the  only  change  is  in  which  symbol  plays  the  role  of  desired  symbol.  In  particular,  comparing 
Figures  8.6  and  8.7,  we  see  that  the  roles  of  desired  and  interfering  symbols  shifts  by  one  as  we  go 
from  the  observation  interval  for  6[0]  to  that  for  6[1].  Thus,  an  appropriately  designed  strategy 
for  handling  ISI  over  a  given  observation  interval  should  work  for  other  observation  intervals  as 
well.  This  opens  up  the  possibility  of  realizing  adaptive  equalizers  which  can  learn  enough  about 
the  statistics  of  the  ISI  and  noise  to  compensate  for  them. 

We  focus  here  on  linear  equalization,  which  corresponds  to  using  the  decision  statistic  cTr[n]  to 
estimate  b[n],  where  c  is  an  appropriately  chosen  correlator.  The  choice  of  c  can  be  independent 
of  n,  by  virtue  of  cyclostationarity.  For  BPSK  signaling,  for  example,  this  leads  to  a  decision 
rule 

b[n]  =  sign  (cTr[n])  (8.2) 


8.2.1  Adaptive  MMSE  Equalization 

While  constraining  ourselves  to  linear  equalization  is  suboptimal  (discussion  of  optimal  equaliza¬ 
tion  is  beyond  our  present  scope),  we  can  try  to  optimize  c  to  combat  ISI  and  noise.  In  particular, 
the  linear  MMSE  criterion  corresponds  to  choosing  c  so  as  to  minimize  the  mean  squared  error 
(MSE)  between  the  decision  statistic  and  the  desired  symbol,  defined  as 

MSE  =  J (c)  =  E  [(cTr[n]  -  b[n})2]  (8.3) 

Minimizing  the  MSE  in  this  fashion  leads  to  minimizing  the  contribution  due  to  ISI  and  noise 
at  the  correlator  output,  which  is  clearly  a  desirable  outcome. 

The  MSE  is  a  quadratic  function  of  c,  and  can  therefore  be  minimized  by  setting  its  gradient 
with  respect  to  c  to  zero.  Due  to  linearity,  the  gradient  can  be  taken  inside  the  expectation,  and 
we  obtain 

'VcJ(c)  =  2E  [r[n](cTr[n]  —  &[n])]  =  2E  [r[u](rr[n]c  —  6[n])] 

Defining 

R  =  E  [r[n]rT[n]]  ,  p  =  E  [6[n]r[n]]  (8.4) 

we  can  rewrite  the  gradient  of  the  MSE  as 

Vc  J(c)  =  2  (Rc  —  p)  (8.5) 

Setting  the  gradient  to  zero  yields  the  following  expression  for  the  MMSE  correlator: 

Cmmse  —  R-1p  (8.6) 

In  order  to  compute  this,  we  must  know,  or  be  able  to  estimate,  the  expectations  in  (8.4).  If  we 
know  the  transmit  filter,  the  channel  filter,  the  receive  filter,  the  sampling  times,  and  the  noise 
PSD,  we  can  compute  these  expectations  using  a  model  such  as  (8.13).  However,  we  often  do 
not  have  explicit  knowledge  of  one  or  more  of  these  quantities.  Thus,  an  attractive  approach  in 
practice  is  to  exploit  the  stationarity  of  the  model  as  we  vary  n  to  estimate  expectations  using 
their  empirical  averages.  These  expectations  involve  the  received  vectors  r  [n] ,  which  we  of  course 
have  access  to,  and  the  symbols  b[n],  which  we  assume  we  have  access  to  over  a  training  period  in 
which  a  known  sequence  of  symbols  is  transmitted.  This  approach  leads  to  adaptive  equalization 
techniques  that  do  not  require  explicit  knowledge  or  estimates  of  the  model  parameters. 
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Least  Squares  Adaptation:  Assuming  that  the  first  ntraining  symbols  are  known,  least 
squares  adaptation  corresponds  to  replacing  the  expectations  in  (8.4)  by  their  empirical  averages 
as  follows: 


R  = 


ntraining 


ntraining 

£ 

n= 1 


rfnlrT[nl 


P  = 


ntraining 


ntraining 

6[n]r[n] 


n= 1 


(8.7) 


where  the  normalization  by  ,  1  . —  is  not  needed,  but  is  put  in  to  make  the  averaging  interpre- 
tation  transparent.  The  MMSE  correlator  is  now  approximated  by  the  least  squares  solution: 

cLs  =  (R)_1P  (8.8) 

This  correlator  can  now  be  used  to  make  decisions  on  the  unknown  symbols  following  the  training 
period.  It  can  be  checked  that  the  preceding  solution  minimizes  the  empirical  MSE  over  the 
training  period: 

ntraining 

MSE  =  ^  (c1  r[n]  —  b[n ])2 

n= 1 

Filter  implementation  of  linear  equalization:  For  conceptual  clarity,  we  have  introduced 
linear  equalization  as  a  correlator  operating  on  the  received  vectors  {r[n]}  obtained  by  windowing 
the  samples  at  the  output  of  the  receive  filter.  However,  an  efficient  technique  for  generating 
the  decision  statistics  cTr[n]  is  by  passing  the  received  samples  through  a  discrete  time  filter 
matched  to  c,  and  then  subsampling  the  output  at  the  symbol  rate  with  an  appropriate  delay. 

The  following  code  fragment  implements  and  tests  least  squares  adaptation,  comparing  it  with 
unequalized  estimates  obtained  by  sampling  at  the  peaks  of  the  net  response  to  a  symbol. 


Code  Fragment  8.2.1  (Least  squares  adaptive  equalization) 

%Use  code  fragments  8.1.1,  8.1.2  with  a  large  value  of  nsymbols 
% (first  ntraining  symbols  assumed  to  be  known) 

°/„Insert  noise  using  code  fragment  8.1.4 
7„downsample  to  q/T  to  get  input  to  equalizer 
q=2; 

r  =  receive_output_noisy (1 :m/q: length(receive_output_noisy) ) ; 

/(figure  out  net  response  to  a  single  symbol  at  receive  filter  output 
rx_pulse  =  (1/m) *conv(pulse ,receive_f ilter) ; 

/(effective  response  after  downsampling 
h=  rx_pulse (1 :m/q: length (rx_pulse) ) ; 

°/„set  equalizer  length 
L=6 ; 

°/„choose  how  to  align  correlator  (e.g.,  to  maximize  desired  vector  energy) 
desired_energy  =  conv(h. ~2 , ones(L , 1) ) ; 

[max_energy  loc_max_energy]  =  max(desired_energy) ; 

°/„choose  offset  to  align  correlator  with  desired  vector  to  maximize  energy 
offset  =  max(loc_max_energy-L,0) 

/(another  option:  set  equalizer  length  equal  to  effective  response 
%L=length(h) ; 

/(offset  =  0; 

/(initialize  for  least  squares  adaptation 
phat  =  zeros (L,l); 

Rhat  =  zeros (L,L); 
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for  n  =  1 :ntraining, 

rn=r (1+q* (n-l)+off set :L+q* (n-l)+off set) ;  /(current  received  vector  r[n] 
phat  =  phat  +  symbols (n) *rn; 

Rhat  =  Rhat  +  rn*rn ’ ; 

end 

/(least  squares  estimate  of  MMSE  correlator 

cLS  =  Rhat\phat;  /(often  more  stable  computation  than  inv (Rhat )* phat 
/(implement  equalizer  as  filter 

h_equalizer  =  flipud(cLS)  ;  °/„would  also  need  conjugation  for  complex  signals 
equal izer_output  =  conv(r ,h_equalizer) ; 

/(sample  filter  output  at  symbol  rate  after  appropriate  delay 
delay  =  length (h_equalizer)+off set ; 

/(symbol  decision  statistics 

decision_stats  =  equalizer_output(delay:q:delay+(nsymbols-l)*q) ; 

°/„payload  =  non-training  symbols 
payload  =  symbols (ntraining+1 :nsymbols) ; 

/(estimate  of  payload  (for  BPSK) 

payload_estimate  =  sign(decision_stats (ntraining+1 :nsymbols)) ; 

7„number  of  errors 

nerrors  =  sum (ne (payload, payload_est imate) ) 

“/COMPARE  WITH  UNEQUALIZED  ESTIMATES 

%unequalized  estimates  obtained  by  sampling  at  peaks  of  effective  response 
[maxval  maxloc]  =  max(h); 

sampling_times  =  maxloc : q: (nsymbols-1) *q+maxloc ; 
unequalized_decision_stats  =  r (sampling_times) ; 
sampled_outputs  =  transmit_output (sampling_times) ; 

/(estimate  of  payload  (for  BPSK) 

payload_estimate_unequalized  =  sign(unequalized_decision_stats (ntraining+1 : nsymbols) ) ; 
0/„number  of  errors 

nerrors_unequalized  =  sum(ne (payload, payload_estimate_unequalized) ) 

Putting  code  fragments  8.1.1,  8.1.2,  8.1.4  and  8.2.1  together,  we  obtain  a  simulation  model  for 
adaptive  linear  equalization  over  a  dispersive  channel.  As  a  quick  example,  for  the  dispersive 
channel  considered,  at  E^/N 0  of  7  dB,  we  estimate  (using  nsymbols  =  10000,  ntraining  =  100) 
the  error  probability  after  equalization  at  rate  2/T  (q  =  2)  to  be  about  3.5  x  10-3  and  the 
unequalized  error  probability  to  be  about  0.16.  Linear  equalization  is  quite  effective  in  this  case, 
although  it  exhibits  some  degradation  relative  to  the  ideal  BPSK  error  probability  of  7.7  x  10-4. 

We  can  now  build  on  this  code  base  to  run  a  variety  of  experiments,  as  suggested  in  Software 
Lab  8.1:  for  example,  probability  of  error  as  a  function  of  E^/Nq  for  different  equalizer  lengths, 
for  different  channel  models,  and  for  different  choices  of  the  transmit  and  receive  Liters.  Our 
model  extends  easily  to  complex-valued  constellations,  as  discussed  below. 

Extension  to  complex-valued  signals:  All  of  the  preceding  development  goes  through  for 
complex-valued  constellations  and  signals,  except  that  vector  transposes  x7  are  replaced  by 
conjugate  transposes  x77.  Indeed,  the  Matlab  code  fragments  we  provide  here  already  include 
this  level  of  generality,  since  we  use  the  conjugate  transpose  operation  x'  when  computing  the 
transpose  for  real-valued  x.  All  that  is  needed  to  employ  these  code  fragments  is  to  make  the 
symbols  complex-valued,  and  to  add  an  imaginary  component  to  the  noise  model  in  code  fragment 
8.1.4.  We  skip  derivations,  and  state  that  the  decision  statistics  are  given  by  cHr[n],  the  MSE 
expression  is 

MSE  =  J (c)  =  E  [\cHr[n]  -  b[n}\2] 
and  the  MMSE  solution  is  given  by  (8.6)  as  before,  with 

R  =  E  [r[n]r77[n]]  ,  p  =  E  [&*[n]r[n]]  (8.9) 
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As  before,  these  statistical  expectations  can  be  replaced  by  empirical  averages  for  a  least  squares 
implementation. 

We  now  have  the  background  required  for  a  hands-on  exposure  to  equalization  through  Software 
Lab  8.1. 


8.2.2  Geometric  Interpretation  and  Analytical  Computations 


Computer  simulations  using  the  code  fragments  in  Sections  8.1  and  8.2  show  that  adaptive 
MMSE  equalization  works  well,  at  least  in  the  specific  examples  considered  in  these  sections, 
and  in  Software  Lab  8.1.  We  now  develop  geometric  insight  into  why  linear  equalization  works 
well  when  it  does,  and  when  it  might  run  into  trouble.  We  stick  with  real- valued  signals,  but  the 
results  extend  easily  to  complex-valued  signals,  as  noted  in  the  appropriate  places.  This  section 
is  not  required  for  doing  Software  Lab  8.1. 

Consider  the  example  depicted  in  Figures  8.6  and  8.7,  where  the  overall  sampled  response  (at 
rate  2 /T)  to  a  single  symbol  is  assumed  to  be 

h  =  (...,0, -0.5, 1,0.5, -0.25,0,...) 

Consider  an  observation  interval  (i.e. ,  equalizer  length)  of  length  L  =  4,  aligned  with  the  response 
to  the  desired  symbol  as  depicted  in  Figures  8.6  and  8.7.  As  shown  in  code  fragment  8.2.1,  we 
can  also  choose  smaller  or  larger  observation  intervals,  and  optimize  their  alignment  using  some 
criterion  (in  the  code  fragment,  the  criterion  is  maximizing  the  energy  of  the  desired  response 
falling  into  the  observation  interval).  In  addition  to  the  contribution  to  r[n]  due  to  b[n],  we  also 
have  contributions  from  other  symbols  before  and  after  it  in  the  sequence,  corresponding  to  parts 
of  appropriately  shifted  versions  of  the  response  h.  For  example,  the  response  to  b[n  +  1]  falling 
in  the  nth  observation  interval  is  obtained  by  shifting  h  by  q  =  2  and  then  windowing.  The 
received  vector  r[n]  can  therefore  be  written  as  follows. 

Model  for  L  =  4:  Two  interfering  symbols  fall  into  the  observation  interval.  The  observation 
interval  is  large  enough  to  accommodate  the  entire  response  due  to  the  desired  symbol. 


r[n]  =  b[n] 


-0.5  \ 

f  0  ^ 

1 

0.5 

-0.25  ) 

+  b[n  +  1] 

0 

-0.5 

V  1  / 

+  b[n  —  1] 


0.5  \ 

-0.25 
0 

0  / 


+  w[n] 


(8.10) 


where  our  convention  is  that  time  progresses  downward,  and  where  w[n]  denotes  noise.  The 
vector  multiplying  b[n]  is  the  desired  vector,  while  the  others  are  interference  vectors.  Figure  8.6 
corresponds  to  n  —  0,  while  Figure  8.7  corresponds  to  n  =  1. 

In  order  to  obtain  the  preceding  model,  the  vector  corresponding  to  a  given  symbol  is  obtained 
by  appropriately  shifting  h,  and  then  windowing  to  the  observation  interval.  In  order  to  ensure 
that  the  modeling  approach  is  clear,  we  also  provide  the  model  for  L  =  3,  where  the  observation 
interval  is  lined  up  with  the  first  three  elements  of  the  response  to  the  desired  symbol,  and  L  =  6, 
where  the  observation  interval  contains  two  additional  samples  on  either  side  of  the  response  to 
the  desired  symbol. 

Model  for  L  =  3:  The  observation  interval  is  smaller  than  the  desired  symbol  response.  Two 
interfering  symbols  fall  in  the  interval. 


r[n]  =  b[n] 


-0.5  \ 

1  J  +  b[n  +  1] 
0.5  / 


°  \ 

0  +  b[n  —  1] 

-0.5  / 


0.5  \ 

—0.25  J  +  w[n] 

0  J 


(8.11) 
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Model  for  L  —  6:  The  observation  interval  is  larger  than  the  desired  symbol  response.  Four 
interfering  symbols  fall  in  the  interval. 


r[n]  =  b[n] 


(  0  \ 

-0.5 
1 

0.5 

-0.25 

V  o  ) 


+  b[n  +  1] 


+  b[n  —  1] 


/  1  \ 

0.5 

-0.25 
0 
0 

V  o  ) 


(  0  \ 
0 
0 

-0.5 
1 

V  0-5  J 


+  b[n  +  2] 


/  0  \ 
0 
0 
0 

V  -0-5  J 


b[n  —  2] 


/  -0.25  \ 
0 
0 
0 
0 

V  o  J 


+  w  \n] 


(8.12) 


Vector  model  for  ISI:  In  general,  we  can  write  the  received  vector  over  observation  interval  n 
as  follows: 

r  [n]  =  6[n]u0  +  ^  b[n  +  k\  uk  +  w[n]  (8.13) 

k^O 

where  b[n],  u0  are  the  desired  symbol  and  vector,  respectively;  b[n  +  k ],  u^.  for  k  ^  0  are 
interference  symbols  and  vectors,  respectively;  and  w[n]  ~  N(0,  Cw)  denotes  the  vector  of  noise 
samples  at  the  output  of  the  receive  filter,  windowed  to  the  current  observation  interval.  For 
an  equalizer  working  with  rate  q/T  samples,  we  have  already  noted  that  successive  observation 
intervals  are  offset  by  q  samples.  Clearly,  the  structure  of  the  ISI  remains  the  same  as  we  go  from 
observation  interval  n  to  n  +  1,  but  the  roles  of  the  symbols  are  shifted  by  one:  for  the  n  +  1st 
observation  interval,  b[n  +  1]  is  the  desired  symbol  multiplying  u0,  while  b[n  +  1  +  k]  for  k  ^  0 
is  the  interfering  symbol  multiplying  u*.. 

Modeling  the  output  of  a  linear  correlator:  A  linear  correlator  c  operating  on  the  received 
vector  produces  the  following  output: 


T 

c  r 


[n]  =  b[n]  cTu0  +  y ^b[ 

k^O 


n 


+  k\  cTuk  +  c1  w[n] 


(8.14) 


where  the  first  term  is  the  desired  term,  the  second  term  is  the  ISI  at  the  correlator  output, 
and  the  third  term  is  the  noise  at  the  correlator  output.  While  the  ultimate  performance  metric 
of  interest  is  the  error  probability,  a  convenient  metric  that  is  easy  to  compute  is  the  signal-to- 
interference-plus-noise  ratio  (SINR)  at  the  output  of  the  linear  correlator,  defined  as  the  ratio  of 
the  average  energy  of  the  desired  term,  to  those  of  the  undesired  terms: 


SINR 


E  [|fe[n]cru0|2] 


E 


ob[n  +  k]cTuk  +  cTw[n]|- 


(8.15) 


Assuming  that  the  symbols  are  uncorrelated  with  E[|6[n]|2]  =  cr2  and  are  independent  of  the 
noise,  we  obtain  the  following  expression  for  the  SINR: 


SINR  = 


9 1  T  I 

Wlc  uo| 


al  lcTufc|2  +  cTCwc 


(8.16) 
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Choosing  c  to  minimize  the  MSE  (8.3)  means  that  we  would  like  to  have  cTr[n\  ps  b[n\.  This 
means  that,  if  the  linear  MMSE  equalizer  is  working,  then  cTu0  ~  1,  and  the  ISI  terms  cTu*,, 
1;  /  0,  and  the  output  noise  variance  crCwc,  are  small.  The  MMSE  criterion  represents  a 
tradeoff  between  ISI  and  noise  at  the  output.  To  see  why,  let  us  consider  the  closely  related 
criterion  of  zero-forcing  equalization.  While  the  noise  in  the  example  considered  in  our  code 
fragments  is  colored,  let  us  first  consider  white  noise  for  simplicity:  w[n]  ~  iV(0,cr2I),  so  that 
the  output  noise  cTw[n]  ~  fV(0,  cr2|  |c|  |2). 


Figure  8.8:  The  zero-forcing  correlator  projects  the  received  signal  along  Pfuo,  the  projection 
of  the  desired  signal  vector  orthogonal  to  the  interference  subspace. 


The  geometry  of  zero-forcing  equalization:  The  zero-forcing  (ZF)  equalizer  is  a  linear 

equalizer  chosen  to  set  the  ISI  terms  at  the  output  exactly  to  zero: 

cTufc  =  0  ,  k  ±  0  (8.17) 

while  scaling  the  desired  term  to  the  right  level: 

cTu0  =  1  (8.18) 

The  first  condition  (8.17)  means  that  c  must  be  orthogonal  to  the  interference  subspace,  which  is 
our  term  for  the  subspace  spanned  by  the  interference  vectors  {u^,  k  ^  0}.  If  (8.17)  is  satisfied, 
then  the  second  condition  (8.18)  can  only  be  satisfied  if  the  desired  vector  u0  does  not  lie  in  the 
interference  subspace,  otherwise  we  would  have  cTu0  =  0  (why?).  Thus,  the  zero-forcing  equalizer 
exists  only  if  the  desired  vector  u0  is  linearly  independent  of  the  interference  vectors  {«*.,  k  ^  0}, 
in  which  case  it  has  a  nonzero  component  P/ u0  orthogonal  to  the  interference  subspace,  as  shown 
in  Figure  8.8.  In  this  case,  if  we  choose  c  to  be  a  scalar  multiple  of  this  orthogonal  component, 
then  (8.17)  is  satisfied  by  construction,  and  (8.18)  can  be  satisfied  by  choosing  the  scale  factor 
appropriately,  as  discussed  shortly.  And  indeed,  while  the  solution  to  (8.17)  and  (8.18)  need  not 
be  unique,  it  can  be  shown  (see  Problem  8.5)  that  choosing  c zf  =  aPfuo  is  optimal  (in  terms  of 
minimizing  error  probability  or  maximizing  SNR)  among  all  possible  ZF  solutions,  assuming  that 
the  noise  vector  w[n]  is  white  Gaussian.  As  we  shall  see,  the  performance  of  the  ZF  correlator 
depends  on  the  length  of  this  orthogonal  projection  Pfuo  relative  to  that  of  the  desired  signal 

vector  u0:  the  smaller  this  relative  length  ,  the  poorer  the  performance. 

For  the  model  (8.10)  for  an  equalizer  of  length  L  —  4,  the  signal  vectors  live  in  a  space  of 
dimension  4,  with  2  interference  vectors.  It  is  quite  clear  that  the  desired  vector  is  indeed 


414 


linearly  independent  of  the  interference  vectors,  and  we  expect  the  ZF  correlator  to  exist.  For 
the  model  (8.10)  for  L  —  3,  we  again  have  2  interference  vectors,  and  it  again  appears  that 
the  ZF  correlator  should  exist,  although  we  would  expect  the  performance  to  be  poorer  because 
the  relative  length  of  the  orthogonal  projection  can  be  expected  to  be  smaller.  Of  course,  such 
intuition  must  be  quantified  by  explicit  computation  of  the  ZF  correlator  and  its  performance, 
which  we  discuss  next. 

Computation  of  the  ZF  correlator:  Let  us  now  obtain  an  explicit  expression  for  the  ZF 

correlator  given  the  vector  ISI  model  (8.13).  Suppose  that  the  signal  vectors  {u^}  are  written  as 
columns  in  a  matrix  U  as  follows: 

U  =  [...U-lUoU!...]  (8.19) 

The  ZF  conditions  (8.17)-(8.18)  can  be  compactly  written  as 

U  tczf  =  e  (8.20) 

where  e  =  (...0, 1,  0,  ...)T  is  a  unit  vector  with  one  corresponding  to  the  column  Uo  and  zeros  cor¬ 
responding  to  columns  U&,  k  0.  Further,  we  can  write  the  ZF  correlator  as  a  linear  combination 
of  the  signal  vectors  (any  component  orthogonal  to  all  of  the  {u^}  can  only  add  noise): 


c  zf  —  Ua 


(8.21) 


Plugging  into  (8.20),  we  obtain 


UTUa  =  e 


so  that 

a  =  (UTU)-1  e 

assuming  invertibility,  which  in  turn  requires  that  the  signal  vectors  {u*,}  are  linearly  independent 
(see  Problem  8.6).  Substituting  into  (8.21),  we  obtain  that 

cZF  =  U(UiU)  1e,  ZF  correlator  for  white  noise  (8.22) 


Noise  enhancement:  By  “looking”  along  the  direction  of  the  orthogonal  component  Pfu0 
shown  in  Figure  8.8,  the  ZF  equalizer  nulls  out  the  interference  vectors.  When  we  plug  in  (8.17)- 
(8.18),  the  output  SINR  expression  in  (8.16)  reduces  to  the  output  SNR.  Setting  Cw  =  crl,  we 
obtain 

2 

SNRzf  =  b — rrx  j  ZF  SNR  for  white  noise  (8.23) 

v2\\czf\\2 

On  the  other  hand,  if  we  ignore  ISI,  then  we  know  from  Chapter  6  that  the  optimal  correlator  in 
AWGN  is  a  scalar  multiple  of  the  desired  vector  uo-  Relative  to  this  “matched  filter”  solution,  the 
ZF  correlator  incurs  loss  in  SNR,  termed  noise  enhancement.  The  reason  that  we  say  the  noise 
is  getting  enhanced  (as  opposed  to  the  signal  getting  reduced)  is  that,  if  we  scale  the  correlator 
to  keep  the  desired  contribution  at  the  output  constant  as  in  (8.18),  then  the  degradation  in 
SNR  corresponds  to  an  increase  in  the  noise  variance.  For  an  ideal  system  with  no  ISI,  the 
received  vector  is  given  by  r[n]  =  6[n]u0  +  w[n].  Setting  c  =  u0,  we  have  cJ r[n]  =  6[n]||u0||2  + 
1V(0,  a2|  |u0|  |2),  from  which  it  is  easy  to  see  that  the  output  SNR  is  given  by 

SNRmf  =  —  —  matched  filter  bound  for  white  noise  (8.24) 

a2  a2 

This  is  termed  the  matched  filter  (MF)  bound  on  SNR,  and  is  an  unrealizable  (because  we  have 
ignored  ISI)  benchmark  that  we  can  compare  the  performance  of  linear  equalization  strategies 
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against.  In  particular,  the  noise  enhancement  (  can  be  defined  as  the  ratio  by  which  the  ZF  SNR 
is  smaller  than  the  MF  benchmark: 


SNRmf 

SNRzf 


noise  enhancement  for  white  noise 


(8.25) 


Let  us  first  interpret  this  geometrically.  Setting  c ZF  =  aPfu0,  the  condition  (8.18)  corresponds 
to 

1  =  (czf j  u0)  =  a(PjU0,Uo)  =  allP^uoll2  (8.26) 

The  last  equality  follows  because  u0  decomposes  into  its  projection  onto  the  interference  sub¬ 
space  P,u0  and  its  orthogonal  projection  P^Up.  Since  these  two  components  are  orthogonal  by 
definition,  we  have 


(P/Uo,u0)  =  (P)-u0,Pyu0)  +  (P/Uo,P^u0)  =  0+  IjP^uoll2 


We  see  from  (8.26)  that 


a  = 


1 


Pfu0 


2 


In  other  words,  a  ZF  correlator  satisfying  (8.17)-(8.18)  can  be  written  in  terms  of  the  projection 
of  the  desired  vector  orthogonal  to  the  interference  subspace  as  follows: 


Czf 


Pfu0 

P/Uoll2 


(8.27) 


from  which  it  follows  that 


(8.28) 


Thus,  the  smaller  the  orthogonal  projection  Pfuo,  the  more  we  must  scale  up  the  correlator 
in  order  to  maintain  the  normalization  (8.18)  of  the  contribution  of  the  desired  symbol  at  the 
output.  Plugging  into  (8.25),  we  obtain  the  following  geometric  interpretation  for  the  noise 
enhancement: 


SNRmf  ||u0||2 
SNRzf  ||P>0||2 


(8.29) 


This  is  intuitively  reasonable:  the  noise  enhancement  is  the  inverse  of  the  factor  by  which  the  ef¬ 
fective  signal  energy  is  reduced  because  of  looking  along  the  orthogonal  projection  Pfu0,  instead 
of  along  the  desired  vector  u0. 

The  following  code  fragment  computes  the  ZF  correlator  and  the  noise  enhancement  for  the 
model  (8.10).  We  find  that  the  noise  enhancement  is  4.4  dB. 


Code  Fragment  8.2.2  Computing  the  ZF  solution  and  its  noise  enhancement 


°/0ZF  example 

°/0matrix  with  signal  vectors  as  columns 

U=transpose( [0.5  -0.25  0  0;-0.5  1  0.5  -0.25;0  0  -0.5  1] ) ; 
°/0unit  vector  with  one  corresponding  to  uO 
e=transpose ( [0  10]); 

°/„coeffs  of  linear  combination 
a=(U,*U)\e; 

°/0ZF  correlator:  linear  comb  of  cols  of  U 
czf=U*a; 

°/„desired  vector  is  second  column 
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uO=U(: ,2) ; 

%check  that  ZF  equations  are  satisfied 
U’*czf  %should  be  equal  to  the  vector  e 
y„noise_enhancement 

noise_enhanceraent  =  (u(P  *u0) * (czf  ’ *czf) 

%in  dB 

noise_enhanceraent_db  =  10*logl0(noise_enhancement) 

While  the  matrix  U  is  specified  manually  in  the  preceding  code  fragment,  for  longer  channels,  we 
would  typically  automate  the  generation  of  U  given  the  channel  impulse  response  h,  the  equalizer 
length  L,  the  oversampling  factor  q,  and  the  specification  of  how  the  observation  interval  lines 
up  with  the  response  to  the  desired  symbol  (i.e. ,  how  to  generate  u0  from  h). 

ZF  correlator  for  colored  noise:  Let  us  now  discuss  how  to  generalize  the  expressions  for 
the  ZF  correlator  and  its  noise  enhancement  for  colored  noise,  where  w[n]  has  covariance  matrix 
Cw  (assumed  to  be  strictly  positive  definite,  and  hence  invertible).  We  limit  ourselves  here  to 
stating  the  results;  guidance  for  deriving  these  results  is  provided  in  Problem  8.9.  The  optimal 
ZF  solution,  in  terms  of  maximizing  the  output  SNR  while  satisfying  (8.17)-(8.18),  is  given  by 

cZf  =  C^U  (UTC-1U)  '  e  ,  ZF  correlator  for  colored  noise  (8.30) 


The  corresponding  SNR  is  given  by 

2 

SNRzf  =  T  -pr -  ,  ZF  SNR  for  colored  noise  (8.31) 

czr^wczr 

If  there  were  no  ISI,  then  the  optimal  correlator  is  given  by  the  whitened  matched  filter  c  =  C^fuo, 
and  the  corresponding  matched  filter  bound  on  SNR  is  given  by 

SNRmf  =  ct^UqC^Uo  ,  matched  filter  bound  for  colored  noise  (8.32) 


Proceeding  as  before,  the  noise  enhancement  is  given  by 
SNRmf 


C  = 


SNR 


ZF 


=  (uqC^Uq)  (c^fCwc zf)  ,  noise  enhancement  for  colored  noise  (8.33) 


The  reader  is  encouraged  to  check  that,  when  we  set  Cw  =  cr2I  in  the  preceding  expressions,  we 
recover  the  expressions  derived  earlier  for  white  noise. 

MMSE  correlator:  While  we  have  seen  how  to  adaptively  implement  the  MMSE  equalizer,  if 
we  are  given  the  vector  ISI  model  (8.13),  then  we  can  compute  the  MMSE  solution  analytically 
(see  Problem  8.8  for  the  derivation)  as  follows: 


CMMSE  —  R-  XP 

R  =  of  UUT  +  Cw  ,  p  =  of  u0 


(8.34) 


We  state  without  proof  the  following  results: 

(1)  Among  the  class  of  linear  correlators,  the  MMSE  correlator  is  optimal  in  terms  of  SINR. 
Thus,  it  achieves  the  best  tradeoff  between  the  ISI  and  noise  at  the  output,  attaining  an  SINR 
that  is  better  than  the  SNR  attained  by  the  ZF  correlator  (for  the  ZF  correlator,  the  SINR  equals 
the  SNR,  since  there  is  no  residual  ISI  at  the  output). 

(2)  The  MMSE  correlator  tends  to  the  ZF  correlator  (if  the  latter  exists)  as  the  noise  variance, 
or  more  generally,  the  noise  covariance  matrix,  tends  to  zero.  This  makes  sense:  if  we  can  neglect 
noise,  then  the  MSE  E[|crr[n]  —  b[n]\  ]  can  be  driven  to  zero  by  forcing  the  ISI  to  zero  as  in  (8.17) 
and  by  scaling  the  desired  contribution  according  to  (8.18),  since  we  then  obtain  crr[n]  =  b[n]. 
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We  summarize  as  follows.  The  zero-forcing  equalizer  drives  the  ISI  to  zero,  while  the  linear 
MMSE  equalizer  trades  off  ISI  and  noise  at  its  output  so  as  to  maximize  the  SINR.  For  large 
SNR,  the  contribution  of  the  ISI  is  dominant,  and  the  MMSE  equalizer  tends  in  the  limit  to  the 
zero- forcing  equalizer  (if  it  exists),  and  hence  pays  the  same  asymptotic  penalty  in  terms  of  noise 
enhancement.  In  practice,  the  MMSE  equalizer  often  performs  significantly  better  than  the  ZF 
equalizer  at  moderate  SNRs,  but  in  order  to  improve  equalization  performance  at  high  SNR,  one 
must  look  to  nonlinear  equalization  strategies,  which  are  beyond  our  present  scope. 

Extension  to  complex-valued  signals:  All  of  the  preceding  development  applies  to  complex- 
valued  constellations  and  signals,  except  that  vector  transposes  xr  are  replaced  by  conjugate 
transposes  xK,  and  the  noise  covariance  matrix  must  include  the  effect  of  both  the  real  and 
imaginary  parts  of  the  noise. 

Noise  model:  In  order  to  model  complex-valued  WGN,  we  set  Cw  =  2a2I.  This  can  be  gener¬ 
ated  by  setting  Re(w)  and  Im(w)  to  be  i.i.d.  fV(0,cr2).  More  generally,  we  consider  circularly 
symmetric,  zero  mean,  complex  Gaussian  noise  vectors  w,  which  are  completely  characterized 
by  their  complex  covariance  matrix, 

Cw  =  E  [(w  —  E[w])(w  —  Efw])^]  =  E  [ww^] 

We  use  the  notation  w  ~  CN{ 0,  Cw).  Detailed  discussion  of  circularly  symmetric  Gaussian  ran¬ 
dom  vectors  would  distract  us  from  our  present  purpose.  Suffice  it  to  say  that  circular  symmetry 
and  Gaussianity  is  preserved  under  linear  transformations.  The  covariance  matrix  evolves  as 
follows:  if  w  =  Bw,  then  Cw  =  BC^B]i.  Thus,  we  can  generate  colored  circularly  symmetric 
Gaussian  noise  w  by  passing  complex  WGN  through  a  linear  transformation.  Specifically,  if  we 
can  write  Cw  =  BBW  (this  can  always  be  done  for  a  positive  definite  matrix),  we  can  generate 
w  as  w  =  Bw,  where  w  ~  CN( 0, 1). 

The  expressions  for  the  ZF  and  MMSE  correlators  are  as  follows: 

cZf  =  CW’U  (U^C^U)  1  e  ,  ZF  correlator  for  complex  —  valued  signals 

cMmse  =  R-1P  ,  MMSE  correlator  for  complex  —  valued  signals 

(R  =  cr^UU"  +  Cw,  p  =  ofu0,of  =  E[|6[n]|2]) 

MMSE  and  SINR:  While  the  SINR  for  any  linear  correlator  can  be  computed  as  in  (8.16), 
we  can  obtain  particularly  simple  expressions  for  the  MSE  and  SINR  achieved  by  the  MMSE 
correlator,  as  follows. 


MMSE  —  (T2  P^C MMSE  =  tf-  p"R  *P 


SINR 


max 


MMSE 


- 1 


(8.36) 


8.3  Orthogonal  Frequency  Division  Multiplexing 

We  now  introduce  an  alternative  approach  to  communication  over  dispersive  channels  whose  goal 
is  to  isolate  symbols  from  each  other  for  any  dispersive  channel.  The  idea  is  to  employ  frequency 
domain  transmission,  sending  symbols  B[n\  using  complex  exponentials  sn(t)  =  e ■?27ITnt,  which 
have  two  key  properties: 

PI)  When  sn(t )  goes  through  an  LTI  system  with  impulse  response  h(t)  and  transfer  function 
H(f),  the  output  is  a  scalar  multiple  of  sn(t).  Specifically, 

ej2irfnt  *  =  H(fny **fnt 
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P2)  Complex  exponentials  at  different  frequencies  are  orthogonal: 

/poo 

Sn(t)s*m(t)dt  =  /  ej2n{fn~fTn)tdt  =  S(fm  -  fn)  =  0  ,  /„/  fm 
J — oo 

This  is  analogous  to  the  properties  of  eigenvectors  of  matrices.  Thus,  complex  exponentials  are 
eigenfunctions  of  any  LTI  system,  as  already  pointed  out  in  Chapter  2. 

Conceptual  basis  for  OFDM:  For  frequency  domain  transmission  with  symbol  B[k\  modu¬ 
lating  the  complex  exponential  sn(t)  =  e^2nBt,  the  transmitted  signal  is  given  by 

u(t)  =  J2B[n\ej27rfnt 

n 

When  this  goes  through  a  dispersive  channel  h(t),  we  obtain  (ignoring  noise) 

(u*h)(t)  =  J2BWH(fn)ej2nfnt 

n 

Note  that  the  symbols  {T[n]}  do  not  interfere  with  each  other  after  passing  through  the  channels, 
since  different  complex  exponentials  are  orthogonal.  Furthermore,  regardless  of  how  complicated 
the  time  domain  channel  h(t)  is,  we  have  managed  to  parallelize  the  problem  of  equalization  by 
going  to  the  frequency  domain.  Thus,  we  only  need  to  estimate  and  compensate  for  the  complex 
scalar  H(fn )  in  demodulating  the  nth  symbol.  We  now  discuss  how  to  translate  this  concept 
into  practice. 

Finite  signaling  interval:  The  first  step  is  to  constrain  the  signaling  interval,  say  to  length  T. 
The  complex  baseband  transmitted  signal  is  therefore  given  by 


N- 1  N—l 

u(t)  =  X  ^Nej27r/nt/[0,T](i)  =  X  B\n]Pnit)  (8.37) 

n= 0  n= 0 

where  B[n\  is  the  symbol  transmitted  using  the  modulating  signal  pn(t)  =  e32n *nt using  the 
nth  subcarrier  at  frequency  fn.  Let  us  now  see  how  the  properties  PI  and  P2  are  affected  by 
time  limiting.  The  time  limited  tone  pn(t)  has  Fourier  transform  Pn(f )  =  Tsinc((/  —  /n)T)e_7r^T, 
which  decays  quickly  as  \f  —  fn\  takes  on  values  of  the  order  of  For  a  channel  whose  impulse 
response  h(t)  is  approximately  timclimited  to  (the  channel  delay  spread),  the  transfer  function 
is  approximately  constant  over  frequency  intervals  of  length  Bc  roughly  inversely  proportional 
to  l/Td  (the  channel  coherence  bandwidth).  If  the  signaling  interval  is  large  compared  to  the 
channel  delay  spread  (T  3>  T^,  then  1/T  is  small  compared  to  the  channel  coherence  bandwidth 
(^  <C  Bc),  so  that  the  gain  seen  by  Pn(f)  is  roughly  constant,  and  the  eigenfunction  property 
is  roughly  preserved.  That  is,  when  Pn{f )  goes  through  a  channel  with  transfer  function  H(f), 
the  output 

Qn(f)  =  H(f)Pn(f )  «  H(fn)Pn(f)  (8.38) 

Regarding  the  orthogonality  property  P2,  two  complex  exponentials  that  are  constrained  to  an 
interval  of  length  T  are  orthogonal  if  the  frequency  separation  is  an  integer  multiple  of  1/T: 

fT  pj2 7T ( fn,  —  fm ) u  1 

/  ej2'Kfnte~:i2nfrntdt  =  - — - — —  =  0  ,  for  (fn  —  fm)T  =  nonzero  integer  (8.39) 

Jo  j2n(fn  -  fm) 

Thus,  if  we  wish  to  send  N  symbols  in  parallel  using  N  subcarriers  (the  term  used  for  each  time- 
constrained  complex  exponential),  we  need  a  bandwidth  of  roughly  N/T  in  order  to  preserve 
orthogonality  among  the  timelimited  tones.  Of  course,  even  if  we  enforce  orthogonality  in  this 
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fashion,  the  timelimited  tones  are  not  eigenfunctions  of  LTI  systems,  so  the  output  corresponding 
to  the  nth  timelimited  tone  is  not  just  a  scalar  multiple  of  itself,  ffowever,  using  (8.38),  we  can 
approximate  the  channel  output  for  the  nth  timelimited  tone  as  H(fn)e/27T^rit I^tT] (t) ■  Thus,  the 
output  corresponding  to  the  transmitted  signal  (8.37)  can  be  approximated  as  follows: 


N—l 

y{t)  ~  Y  5N#(/n)ej2^/nt/[o,T](£)  +  n(t)  (8.40) 

n= 0 

To  summarize,  once  we  limit  the  signaling  duration  to  be  finite,  the  ISI  avoidance  property  of 
OFDM  is  approximate  rather  than  exact.  However,  as  we  now  discuss,  orthogonality  between 
subcarriers  can  be  restored  exactly  in  digital  implementations  of  OFDM.  Before  discussing  such 
implementations,  we  provide  some  background  on  discrete  time  signal  processing. 


8.3.1  DSP-centric  implementation 

The  proliferation  of  OFDM  in  commercial  systems  (including  wireline  DSL,  wireless  local  area 
networks  and  wireless  cellular  systems)  has  been  enabled  by  the  implementation  of  its  transceiver 
functionalities  in  DSP,  which  leverages  the  economies  of  scale  of  digital  computation  (Moore’s 
“law”).  For  T  large  enough,  the  bandwidth  of  the  OFDM  signal  u  is  approximately  N/T,  where 
N  denotes  the  number  of  subcarriers.  Thus,  we  can  represent  u(t)  accurately  by  sampling  at 
rate  1/TS  =  N/T,  where  Ts  denotes  the  sampling  interval.  From  (8.37),  the  samples  are  given  by 


N- 1 

u{kTs)  =  J2  B[n\ej2nnk/N 

n= 0 

We  can  recognize  this  simply  as  the  inverse  DFT  of  the  symbol  sequence  {B[n\}.  We  make  this 
explicit  in  the  notation  as  follows: 


N- 1 

b[k]  =  u(kTs)  =  Y  B[n]ej2™k/N  (8.41) 

n= 0 

If  N  is  a  power  of  2  (which  can  be  achieved  by  zeropadding  if  necessary),  the  samples  {&[&]}  can 
be  efficiently  generated  from  the  symbols  {H[n]}  using  an  inverse  Fast  Fourier  Transform  (IFFT). 
The  complex  baseband  waveform  u(t)  can  now  be  obtained  from  its  samples  by  digital-to-analog 
(D/A)  conversion.  This  implementation  of  an  OFDM  transmitter  is  as  shown  in  Figure  8.9:  the 
bits  are  mapped  to  symbols,  the  symbols  are  fed  in  parallel  to  the  inverse  FFT  (IFFT)  block, 
and  the  complex  baseband  signal  is  obtained  by  D/A  conversion  of  the  samples  (after  insertion 
of  a  cyclic  prefix,  to  be  discussed  after  we  motivate  it  in  the  context  of  receiver  implementation). 
Typically,  the  D/A  converter  is  an  interpolating  filter,  so  that  its  effect  can  be  subsumed  within 
the  channel  impulse  response. 

Note  that  the  relation  (8.41)  can  be  inverted  as  follows: 


B[n] 


1 

N 


N—l 

Y  b[k]e-j2™k/N 

k= 0 


(8.42) 


This  is  exploited  in  the  digital  implementation  of  the  OFDM  receiver,  discussed  next. 

Remark  on  Matlab  FFT  and  IFFT  conventions:  Matlab  puts  a  factor  of  1/N  in  the  IFFT  rather 
than  in  the  FFT  as  done  in  (8.41)  and  (8.42).  In  both  cases,  however,  IFFT  followed  by  FFT 
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N  complex  N  complex 
symbols  in  samples  out 


Figure  8.9:  DSP-centric  implementation  of  an  OFDM  transmitter. 


gives  the  identity.  Note  also  that  Matlab  numbers  vector  entries  starting  with  one,  so  the  FFT 
of  x[n],n  =  1, ...,  N  is  given  by: 


N 

x[k]  =  j2xin}ejMn~1){k~1)/N 

The  corresponding  IFFT  is  given  by 

1  N 

x[n\  =  —  ^x[fc]ei27r("-1)(fe-1)/JV 

k= 1 

We  have  observed  that,  once  we  limit  the  signaling  duration  to  be  finite,  the  ISI  avoidance 
property  of  OFDM  is  approximate  rather  than  exact.  However,  as  we  now  show,  orthogonality 
between  subcarriers  can  be  restored  exactly  in  discrete  time  by  using  a  cyclic  prefix,  which  allows 
for  efficient  demodulation  using  an  FFT.  The  noiseless  received  OFDM  signal  is  modeled  as 


N- 1 

v(t)  =  ^  b[k}p(t  -  kTs ) 

k= 0 

where  the  “effective”  channel  impulse  response  pit)  includes  the  effect  of  the  D/A  converter  at 
the  transmitter,  the  physical  channel,  and  the  receive  filter.  When  we  sample  this  signal  at  rate 
1/7/,  we  obtain  the  discrete-time  model 


N—l 

v[m\  =  E  b[k]h[m  —  k]  (8.43) 

k= o 

where  {h[l]  =  p(lTs)}  is  the  effective  discrete  time  channel  of  length  L,  assumed  to  be  smaller 
than  N.  We  assume,  without  loss  of  generality,  that  h[l]  =  0  for  /  <  0  and  l  >  L.  We  can  rewrite 
(8.43)  as 

L—l 

v[m]  =  E  h[l)b[m  —  l ]  (8.44) 

1=0 

Let  H  denote  the  N  point  DFT  of  h,  where  N  >  L. 


N-l  L—l 

H[n]  =  h[l\e~j2™l/N  =  h{l}e~j27rnl/N  (8.45) 

1=0  1=0 

As  noted  in  (8.42),  the  DFT  of  {&[&]}  is  the  symbol  sequence  B[n]  (the  normalization  is  chosen 
differently  in  (8.42)  and  (8.45)  to  simplify  the  forthcoming  equations.)  In  order  to  parallelize 
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equalization  across  the  N  subcarriers,  we  would  like  the  noiseless  signal  to  equal  V[n]  =  H[n\B[n\. 
However,  this  is  not  quite  satisfied  in  our  setting.  We  now  discuss  why  not,  and  how  to  modify  the 
system  so  as  to  indeed  enforce  such  a  relationship.  Before  doing  this,  we  need  a  brief  discussion 
of  the  DFT  and  its  dual  operation,  the  cyclic  convolution. 

DFT  multiplication  and  cyclic  convolution:  The  time  domain  samples  {&[&]}  defined  via 
the  IDFT  in  (8.41)  have  range  0  <  k  <  N  —  1.  If  we  now  plug  in  integer  values  of  k  outside 
this  range,  we  simply  get  a  periodic  extension  b]y[k]  of  these  samples  with  period  N,  satisfying 
b^[k  +  N]  —  b]y[k]  for  all  k ,  with  b^[k]  =  b[k ],  0  <  k  <  N  —  1.  Thus,  the  IDFT  can  be  viewed  as 
a  discrete  time  analogue  of  a  Fourier  series  for  a  periodic  time  domain  sample  sequence  {6jv[/c]}. 
We  know  that,  for  the  Fourier  transform,  “multiplication  in  the  frequency  domain  corresponds  to 
convolution  in  the  time  domain.”  We  skipped  the  analogous  result  for  Fourier  series  in  Chapter 
2  because  we  did  not  need  it  then.  Now,  however,  we  establish  the  appropriate  result  for  the 
discrete  time  Fourier  series  of  interest  here:  for  the  DFT,  if  we  multiply  two  sequences  in  the 
frequency  domain,  then  it  corresponds  to  a  cyclic,  or  periodic,  convolution  in  the  time  domain. 

While  the  result  we  wish  to  establish  is  general,  let  us  stick  with  the  notation  we  have  already 
established.  Consider  the  “desired”  sequence  V[n]  =  H[n]B[n],  n  =  0,  ...,N  —  1,  that  we  would 
like  to  get  when  we  take  the  DFT  of  the  output  of  the  channel.  What  is  the  corresponding  time 
domain  sequence?  To  see  this,  take  the  IDFT: 


N- 1 

v[m]  =  E  H[n]B[n}ej2nmn/N 

n= 0 


Plugging  in  the  expression  (8.45)  for  the  channel  DFT  coefficients,  we  obtain 


v 


[m]  =  Eto1  h[l]e~^nl/N B[n]e^mn/N 
=  mfnZo1  B{n\e^~l^N 


(8.46) 


Now,  the  summation  over  n  corresponds  to  an  IDFT,  and  therefore  gives  us  b[m  —  l]  as  long  as 
0  <  m  —  l  <  N  —  1.  Outside  this  range,  it  gives  us  the  periodic  extension  {b^[m  —  /]}: 


N-l 

Y  B[n\eBBm~l>/N  =  bN[m  —  1} 

n= 0 


Thus,  we  can  write  (8.46)  as 

L—l  L—l 

v[m]  =  Yj  h[l]bisr[m  —  l]  =  ^  h[l]b[(m  —  l)  mod  N]  —  (h  0  b)[m]  (8.47) 

1=0  1=0 

where  we  have  introduced  the  notation  h  ©  b  to  denote  the  cyclic  convolution  of  h  and  b.  While 
we  have  derived  this  result  in  our  particular  context,  it  is  worth  stating  that  it  holds  generally: 
the  cyclic  convolution  modulo  N  between  two  sequences  p  and  q,  each  of  length  at  most  N  (it  is 
often  convenient  to  think  of  them  as  having  length  N,  using  zeropadding  if  necessary)  is  defined 
as  the  convolution  over  a  period  of  length  N  of  their  periodic  extensions  with  period  N: 


N- 1 

(p®q)[m\  =  YPN[l}qN{m  -  l) 

1=0 

The  N  point  DFT  of  the  cyclic  convolution  of  these  two  sequences  is  the  product  of  their  DFTs. 

Figure  8.10  illustrates  cyclic  convolution  modulo  N  =  4  between  a  sample  sequence  {&[&]}  of 
length  4  and  a  channel  impulse  response  of  length  2,  while  Figure  8.11  illustrates  the  correspond¬ 
ing  linear  convolution. 
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Flip  and  slide  sample  sequence {b[kj }  on  a  circle 


b[3] 

v[0]  =  b[0]  h[0]  +  b[3Jh[l] 


b[0] 

v[l]  =  b[l]  h[0]  +  b[0J  h[l] 


b[l]  b[2] 

v[2]  =  b[2]  h[0]  +  b[lj  h[l]  v[3]  =  b[3]  h[0J  +  b[2]  h[_l] 


Cyclic  convolution  outputs 


Figure  8.10:  Example  of  cyclic  convolution.  Time  progresses  clockwise  on  the  circle.  The  se¬ 
quence  {&[&;]}  is  flipped,  and  hence  goes  counter-clockwise.  We  then  “slide”  this  flipped  sequence 
clockwise  in  order  to  compute  successive  outputs.  Clearly,  the  output  is  periodic  with  period 
N  =  4. 


Flip  and  slide  sample  sequence { b[kj } 

Linear  convolution  output 

b[3]  b[2] 

•  • 

b[l] 

• 

•1 

v[0]  =  b[0]  h[0J 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[l]  =  b[l]h[0]+b[0]  h[l] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[2]=b[2]  h[0]  +  b[  1]  h[l] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[3]  =  b[3]  h[0]  +  b[2]  h[l] 

b[3] 

• 

b[2] 

• 

b[l] 

• 

bbH  v[4]  =  b[3]  h[l] 

• 

^h[0] 

• 

h[l]_ 

Channel  impulse  response  { h[k]  j 


Figure  8.11:  Linear  convolution  of  the  two  sequences  in  Figure  8.10  leads  to  an  aperiodic  sequence 
of  length  2  +  4  —  1  =  5.  Note  that  the  outputs  at  times  1,2  and  3  coincide  with  the  outputs  of 
the  corresponding  cyclic  convolution. 
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Flip  and  slide  periodic  extension  of  sample  sequence  { b[k] } 


Dnly  need  this  extra  sample  to  ensure 
complete  overlap  with  channel  coefficients 


b[2] 

• 

b[l] 

• 

b[0K 

•  V 

rb[3] 

7# 

\b[2] 

/  • 

b[l] 

• 

b[on 

v[0]  =  b[0]  h[0]  +  b[3]  h[  1  ] 

Linear  convolution  with  cyclic  prefix 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[l]  =  b[llh[0]  +  b[01  h[  1  ] 

Lineal-  convolution 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[2]  =  b[2]  h[0]  +  b[l]  h[l] 

coincides  with  cyclic  convoluaion 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[3]  =  b[3]  hfO]  +  b[2]  h[ll 

b[3] 

• 

b[2] 

• 

b[l] 

• 

b[0] 

• 

v[4]=  b[3]h[l] 

Does  not  coincide 

with  circular  convolution 

• 

• 

pi[0] 

h[ry 

Channel  impulse  response  { h[k] } 


Figure  8.12:  Using  linear  convolution  to  emulate  a  circular  convolution. 


Let  us  summarize  where  we  now  stand.  In  order  to  parallelize  the  channel  in  the  DFT  domain,  we 
need  a  cyclic  convolution  in  the  time  domain  given  by  (8.47).  However,  what  the  physical  channel 
actually  gives  us  is  the  linear  convolution  of  the  form  (8.44).  In  order  to  get  the  cyclic  convolution 
we  want,  we  simply  need  to  send  an  appropriately  large  segment  of  a  periodic  extension  of  the 
time  domain  samples  {&[&]}  through  the  channel.  Indeed,  if  we  only  want  to  get  N  outputs 
corresponding  a  single  period  of  the  output  of  the  circular  convolution,  then  we  do  not  need  a 
full-fledged  periodic  extension.  Figure  8.12  shows  how  to  get  the  first  N  —  4  outputs  of  a  linear 
convolution  to  be  equal  to  a  period  of  the  cyclic  convolution  in  Figure  8.10  by  inserting  a  single 
sample.  More  generally,  we  need  a  cyclic  prefix  of  length  L  —  1  for  a  channel  of  length  L ,  as 
discussed  below. 

Since  L  <  N,  we  can  write  the  circular  convolution  (8.47)  as 


min(L— l,ra)  L—l 

v[m\  =  E  h[l]b[m  —  l]  +  ^  h[l]b[m  —  l  +  N]  (8.48) 

1=0  Z=m+ 1 

Comparing  the  linear  convolution  (8.47)  and  the  cyclic  convolution  (8.48),  we  see  that  they  are 
identical  except  when  the  index  m  —  l  takes  negative  values:  in  this  case,  b[m  —  l]  =  0  in  the 
linear  convolution,  while  b[{m  —  l)  mod  N]  =  biy(m  —  l)  =  b[m  —  l  +  N]  contributes  to  the  circular 
convolution.  Thus,  we  can  emulate  a  cyclic  convolution  using  the  physical  linear  convolution  by 
sending  a  cyclic  prefix ;  that  is,  by  sending 

b[k]  =  bN[k]  =  b[N  +  k],  k  —  — (L  -  1),  — (L  -  2), ...,  -1 

before  we  send  the  samples  6[0], ...,  b[N  —  1],  That  is,  we  transmit  the  samples 


b[N  —  L  +  1], ...,  b[N  -  1],  6[0], ...,  b[N  -  1] 


incurring  an  overhead  of  (L  —  1  )/N  which  can  be  made  small  by  choosing  N  to  be  large. 

In  the  example  depicted  in  Figure  8.12,  N  —  4  and  L  —  2,  and  we  insert  the  cyclic  prefix  6 [3], 
sending  6 [3],  6[0],  6[1],  6 [2],  6 [3]  (when  this  is  flipped  for  the  pictorial  convolution  in  the  figure,  the 
extra  sample  b [3]  appears  at  the  end). 

At  the  receiver,  the  complex  baseband  signal  is  sampled  at  rate  1/TS  to  obtain  noisy  versions  of 
the  samples  {&[&;]}.  The  FFT  of  these  samples  then  yields  the  model 

Y[n]  =  H[n\B[n\  +  N[n)  (8.49) 
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Figure  8.13:  DSP-centric  implementation  of  an  OFDM  receiver.  Carrier  and  timing  synchro¬ 
nization  blocks  are  not  shown. 


where  the  frequency  domain  noise  samples  N[n]  are  modeled  as  i.i.d.  complex  Gaussian,  with 
Re(lV[n])  and  Im(lV[n])  being  i.i.d.  N(0,a2).  If  the  receiver  knows  the  channel,  then  it  can 
implement  ML  reception  based  on  the  statistic  H*[n]Y[n\.  Thus,  the  task  of  channel  equalization 
has  been  reduced  to  compensating  for  scalar  channel  gains  for  each  subcarrier.  This  makes 
OFDM  extremely  attractive  for  highly  dispersive  channels,  for  which  time  domain  singlecarrier 
equalization  strategies  would  be  difficult  to  implement. 

Channel  estimation:  Channel  estimation  (along  with  timing  and  carrier  synchronization, 
which  are  not  considered  here)  is  accomplished  by  sending  pilot  symbols.  In  Software  Lab  8.2, 
we  send  an  entire  OFDM  symbol  as  a  pilot,  followed  by  a  succession  of  other  OFDM  symbols 
with  payload. 


8.4  MIMO 

The  term  Multiple  Input  Multiple  Output  (MIMO),  or  space-time  communication,  refers  to  com¬ 
munication  systems  employing  multiple  antennas  at  the  transmitter  and  receiver.  We  now  pro¬ 
vide  a  brief  introduction  to  key  concepts  in  MIMO  systems,  along  with  pointers  for  further 
exploration. 

While  much  effort  and  expertise  must  go  into  the  design  of  antennas  and  their  interface  to 
RF  circuits,  the  following  abstract  view  suffices  for  our  purpose  here:  at  the  transmitter,  an 
antenna  transduces  electrical  signals  at  radio  frequencies  into  electromagnetic  waves  at  the  same 
frequency  that  propagate  in  space;  at  the  receiver,  the  antenna  transduces  electromagnetic  waves 
in  a  certain  frequency  range  into  electrical  signals  at  the  same  set  of  frequencies.  Antennas  which 
are  insensitive  to  the  direction  of  arrival/departure  of  the  waves  are  termed  omnidirectional  or 
isotropic  (while  there  is  no  such  thing  as  an  ideal  isotropic  antenna,  it  is  a  convenient  conceptual 
building  block).  Antennas  which  are  sensitive  to  the  direction  of  arrival  or  departure  are  termed 
directional.  It  is  possible  to  synthesize  directional  responses  using  an  array  of  omnidirectional 
antenna  elements,  as  we  discuss  next. 


8.4.1  The  linear  array 

Consider  a  plane  wave  impinging  on  the  uniformly  spaced  linear  array  shown  in  Figure  8.14. 
We  see  that  the  wave  sees  slightly  different  path  lengths,  and  hence  different  phase  shifts,  in 
reaching  different  antenna  elements.  The  path  length  difference  between  two  successive  elements 
is  given  by  i  =  dsind,  where  d  is  the  inter-element  spacing,  and  6  the  angle  of  arrival  (AoA) 
relative  to  the  broadside.  The  corresponding  phase  shift  across  successive  elements  is  given 
by  4>  =  2n£/X  =  27rdsin#/A,  where  A  denotes  the  wavelength.  Another  way  to  get  the  same 
result:  the  delay  difference  between  successive  elements  is  r  =  I/c,  where  c  is  the  speed  of  wave 
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Figure  8.14:  A  plane  wave  impinging  on  a  linear  array. 


propagation  (equal  to  3  x  108  m/s  in  free  space).  For  carrier  frequency  fc,  the  corresponding 
phase  shift  is  0  =  2i tJct  =  2nfcdsmd/c.  The  two  expressions  are  equivalent,  since  X  —  j-. 

The  narrowband  assumption:  What  is  the  effect  of  the  differences  in  delays  seen  by  successive 
elements?  Suppose  that  the  wave  impinging  on  element  1  is  represented  as 

Up[t)  =  uc(t)  cos27r  fct  —  us(t )  sin27r/ct  =  Re  (u(t)e-'27r^c*) 

where  u(t)  =  uc(t)  +  jus(t)  is  the  complex  envelope,  assumed  to  be  of  bandwidth  W.  Suppose 
that  the  bandwidth  W  <C  /c:  this  is  the  so-called  “narrowband  assumption,”  which  typically 
holds  in  most  practical  settings.  For  the  scenario  shown  in  the  figure,  the  wave  arrives  t  —  ijc 
time  units  earlier  at  element  2.  The  wave  impinging  on  element  2  can  therefore  be  represented 
as 


vp(t )  =  uc(t  +  r)  cos  2n  fc(t  +  r)  —  us(t  +  r)  sin  2nfc(t  +  r)  =  Re  (u(t  +  T)e?<'V27r^ct) 

where  0  =  2i rfcT.  Thus,  the  complex  envelope  of  the  wave  at  element  2  is  v(t)  =  u(t  +  r)e The 
time  shift  r  has  two  effects  on  the  complex  envelope:  a  time  shift  in  the  baseband  waveform  u, 
along  with  a  phase  rotation  0  due  to  the  carrier.  However,  for  most  settings  of  interest,  the  time 
shift  in  the  baseband  waveform  can  be  ignored.  To  see  why,  suppose  that  the  array  parameters 
are  such  that  0  is  of  the  order  of  2tt  or  less,  in  which  case  r  is  of  the  order  of  4-  or  less.  Under 
the  narrowband  assumption,  the  time  shift  r  produces  little  distortion  in  u.  To  see  this,  note 
that 

u(t  +  r)  O  U{f)ej27TfT 

As  /  varies  over  a  range  W,  the  frequency-dependent  phase  change  produced  by  the  time  shift 
varies  over  a  range  27rfUr  ~  2nW /  fc  <C  27t  for  W  <C  fc ■  Thus,  we  can  ignore  the  effect  of  the  time 
shift  on  the  complex  envelope,  and  model  the  complex  envelope  at  element  2  as  v(t)  ~  u{t)e^. 
Similarly,  for  element  3,  the  complex  envelope  is  well  approximated  as  u^e^2^. 

Array  response  and  spatial  frequency:  Under  the  narrowband  assumption,  if  the  complex  envelope 
at  element  1  is  u(t),  then  the  complex  envelopes  at  the  various  elements  can  be  collected  into  a 
vector  u(t) a,  where 

a  =  (1,  e?+,  ej2<t,: ...,  (8.50) 

is  the  array  response  for  a  particular  AoA.  Making  the  dependence  on  the  AoA  6  explicit  for 
the  linear  array,  we  have  (f)(6)  =  27idsm9/ X,  which  yields  a  corresponding  array  response  a (6). 
The  linear  increase  in  phase  across  antenna  elements  (i.e,  across  space)  is  analogous  to  the 
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linear  increase  of  phase  across  time  for  a  sinusoid.  Thus,  we  call  0  =  0(0)  the  spatial  frequency 
corresponding  to  AoA  6.  The  collection  of  array  responses  {a(0),  6  e  [— n,  7r] }  as  we  vary  the  AoA 
is  termed  the  array  manifold. 

Reciprocity:  While  Figure  8.14  depicts  an  antenna  array  receiving  a  wave,  exactly  the  same 
reasoning  applies  to  an  antenna  array  emitting  a  wave.  In  particular,  the  principle  of  reciprocity 
tells  us  that  the  propagation  channel  from  transmitter  to  receiver  is  the  same  as  that  from  receiver 
to  transmitter.  Thus,  the  array  response  of  a  linear  array  for  angle  of  arrival  6  is  the  same  as 
the  array  response  for  angle  of  departure  6. 


Antenna  1 


Antenna  N 


Common  LO 
for  downconversion 


Figure  8.15:  MIMO  signal  processing  architecture.  There  is  one  “RF  chain”  per  antenna,  down¬ 
converting  the  signal  received  at  that  antenna  to  I  and  Q  components. 


Signal  processing  architecture:  What  the  preceding  complex  baseband  model  means  physically 
is  that,  if  we  downconvert  the  RF  signals  at  the  outputs  of  the  antenna  elements  (using  the 
same  LO  frequency  and  phase,  and  Liters  with  identical  responses,  in  each  such  “RF  chain”), 
then  the  complex  envelopes  corresponding  to  the  different  antenna  elements  will  be  related  as 
described  above.  Once  the  I  and  Q  components  for  these  complex  envelopes  are  obtained, 
they  would  typically  be  sampled  and  quantized  using  analog-to-digital  converters  (ADCs),  and 
then  processed  digitally.  Such  a  DSP-centric  signal  processing  architecture,  depicted  in  Figure 
8.15,  allows  the  implementation  of  sophisticated  MIMO  algorithms  in  today’s  cellular  and  WiFi 
systems.  While  the  figure  depicts  a  receiver  architecture,  an  entirely  analogous  block  diagram  can 
be  drawn  for  a  MIMO  transmitter,  simply  by  reversing  the  arrows  and  replacing  downconverters 
by  upconverters. 

While  the  DSP-centric  architecture  depicted  in  Figure  8.15  has  been  key  to  enabling  the  widespread 
deployment  of  low-cost  MIMO  transceivers,  it  may  need  to  be  revisited  as  carrier  frequencies, 
and  the  available  signaling  bandwidths,  scale  up.  Both  the  cost  and  power  consumption  of  ADCs 
with  adequate  precision  can  be  prohibitively  large  at  very  high  sampling  rates,  hence  alternative 
architectures  with  MIMO  processing  done,  wholly  or  in  part,  prior  to  ADC  may  need  to  be 
considered.  See  the  epilogue  for  further  discussion. 


8.4.2  Beamsteering 

Once  we  know  the  array  response  for  a  given  direction,  we  can  maximize  the  received  power 
(for  a  receive  antenna  array)  or  the  transmitted  power  (for  a  transmit  antenna  array)  in  that 
direction  by  employing  a  spatial  matched  filter  or  spatial  correlator.  If  the  first  antenna  element 
receives  a  complex  baseband  waveform  (after  downconversion  and  sampling)  s[n]  from  AoA  6 , 
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then  the  output  of  the  antenna  array  is  modeled  as  a  vector  of  complex  baseband  discrete  time 
signals  with  fcth  component 

yk[n\  =  s(t)  +  wk[n]  ,  k  =  1,  2, N  (8.51) 

where  <f>{9)  is  the  spatial  frequency  corresponding  to  9 ,  and  where  Wk[n]  are  typically  modeled 
as  complex  WGN,  independent  across  space  and  time:  Re(tffc[n])  and  lm(M;fc[n])  i.i.d.  iV(0,cr2) 
for  all  k ,  n.  In  vector  notation,  we  can  write 

y  [n]  =  a(0)s[n]  +  w  [n\  (8.52) 

where  y [n]  =  (yi[n\, ...,  2/vH)r,  w[n]  =  (wi[n], ...,  tcjv[n.])T,  and  a (9)  is  the  array  response  cor¬ 
responding  to  direction  9.  We  have  not  discussed  complex  WGN  in  detail  in  this  text,  but  in 
analogy  with  the  results  in  Chapters  5  and  6  for  real  WGN,  it  is  possible  to  show  that  correlation 
against  a  noiseless  signal  template  is  the  right  thing  to  do.  Thus,  regardless  of  the  value  of  the 
time  domain  sample  s[n],  the  spatial  processing  that  maximizes  SNR  is  to  correlate  against  the 
noiseless  template  a (9).  That  is,  we  wish  to  compute  the  decision  statistics 

Z[n]  =  (y[n],a(0))  =  aH(9)y[n\  (8.53) 

Correlating  the  spatial  signal  against  the  array  response  in  this  fashion  is  termed  beamform¬ 
ing.  The  desired  signal  contribution  to  the  decision  statistic  obtained  from  beamforming  is 
||a(0)||2s[n]  =  iVs[n].  Thus,  the  signal  amplitude  gets  scaled  by  a  factor  of  N,  and  hence  the 
signal  power  gets  scaled  by  a  factor  N2.  It  can  be  shown  that  the  variance  of  the  noise  contri¬ 
bution  to  the  decision  statistic  gets  amplified  by  a  factor  of  N.  Thus,  the  SNR  gets  amplified 
by  a  factor  of  N  by  beamforming  at  the  receiver.  This  is  called  the  beamforming  gain.  Receive 
beamforming  is  also  termed  maximal  ratio  combining,  because  it  combines  the  spatial  signal  in 
a  manner  that  maximizes  the  signal-to-noise  ratio. 

Receive  beamforming  gathers  energy  coming  from  a  given  direction.  Conversely,  transmit  beam- 
forming  can  be  used  to  direct  energy  in  a  given  direction.  For  example,  if  a  linear  transmit 
antenna  array  seeks  to  direct  energy  towards  an  angle  of  departure  9,  then,  in  order  to  send 
a  time  domain  samples  s [ri] .  it  should  transmit  the  spatial  vector  s[n]aH(9).  Since  the  spatial 
channel  to  the  receiver  is  a(0),  the  signal  received  is  given  by  s[n]aH (9) a(9)  =  Ns[n}.  Thus,  the 
received  amplitude  scales  as  N,  and  the  received  power  as  N2.  Since  the  noise  at  the  receiver  does 
not  get  the  benefit  of  this  transmit  beamforming  gain,  transmit  beamforming  with  N  antennas 
leads  to  an  SNR  gain  of  N2  relative  to  a  single  antenna  system,  if  we  fix  the  per-antenna  emitted 
power.  The  signal  transmitted  from  antenna  k  is  s[n]e^k~1'>^9') ,  which  has  power  |s[n  2,  and 
since  we  have  N  antenna  elements,  we  are  transmitting  at  N  times  the  power.  The  additional 
factor  of  N  in  received  power  comes  from  the  fact  that,  by  choosing  the  beamforming  coefficients 
appropriately,  we  are  ensuring  that  the  signals  from  these  N  antenna  elements  add  up  in  phase 
at  the  receiver,  which  leads  to  an  TV-fold  gain. 

Thus,  both  transmit  and  receive  beamforming  perform  spatial  matched  filtering,  leading  to  a 
beamforming  gain  of  N.  That  is,  the  SNR  is  enhanced  by  a  factor  of  N.  In  addition,  if  each 
element  in  a  transmit  antenna  array  transmits  at  a  power  equal  to  that  of  a  reference  single  ele¬ 
ment  antenna,  then  we  have  an  additional  power  combining  gain  of  N  for  transmit  beamforming, 
leading  to  a  net  SNR  gain  of  N2. 

Beamforming  directs  energy  in  a  given  direction  by  ensuring  that  the  radio  waves  emitted  or 
received  from  that  direction  (or  their  complex  envelopes)  add  constructively,  or  in  phase.  The 
radio  waves  in  other  directions  may  add  constructively  or  destructively,  depend  on  the  array 
geometry.  Thus,  it  is  of  interest  to  characterize  the  beam  pattern  corresponding  to  a  particular 
set  of  beamforming  coefficients.  If  we  are  beamforming  in  direction  90,  then  the  gain  in  an 
arbitrary  direction  9  is  given  by 

G(0;0o)  =  |(a(0),a(0o))|  =  |aH(0o)a(0)| 

The  following  code  fragment  computes  and  plots  the  beam  pattern  for  a  linear  array. 


428 


Code  Fragment  8.4.1  Plotting  beam  patterns  for  a  linear  array 

d=l/3 ;  /(normalized  inter-element  spacing 
N=10;  y„number  of  array  elements 

thetaO_degrees  =  0;  %desired  angle  from  broadside  in  degrees 
theta0=theta0_degrees*pi/180 ;  °/„desired  angle  from  broadside  in  radians 
phi0=2*pi*d*sin(theta0)  ;  °/„desired  spatial  frequency 
a0=  exp(  j*phi0*  [0 :  N-l] ) ;  °/„array  response  in  desired  direction 
theta_degrees  =  -90:1:90;  %sweep  of  angles  with  respect  to  broadside 
theta  =  theta_degrees*pi/180 ;  °/„ (angles  in  radians) 

phi  =  2*pi*d*sin(theta)  ;  °/„spatial  freqs  as  a  function  of  angle  wrt  broadside  (as  a  row) 
°/„array  responses  corresponding  to  the  spatial  freqs  as  columns 
array_responses  =  exp(j*transpose( [0:N-1] )*phi) ; 

%inner  product  of  desired  array  response  with  array  responses  in  other  directions 
rho  =  conj (aO) *array_responses ; 

plot (theta_degrees , 10*logl0(abs (rho) ) ) ;  %plot  gain  (dB)  versus  angle 
hold  on; 

stem(thetaO_degrees ,  10*logl0(N) ,  ’r’ )  ;  /(indicates  desired  direction 
xlabel( ’Angle  with  respect  to  broadside’); 
ylabel(’Gain  (dB)’); 


Figure  8.16:  Example  beam  patterns  with  a  linear  array,  generated  using  code  fragment  8.4.1. 

Array  spacing:  In  the  preceding  code  fragment,  we  have  set  the  element  spacing  at  A/3.  In 
Problem  8.10,  we  explore  the  effect  of  varying  the  element  spacing,  and  in  particular,  what 
happens  as  the  element  spacing  exceeds  A/2. 

Notational  convention:  We  say  that  we  employ  beamforming  weights  or  coefficients  c  =  (ci, ...,  cjv)t 
when  we  apply  the  coefficient  c*  to  the  ith  antenna  element.  For  a  receive  beamformer,  if  the 
spatial  signal  being  received  is  y  =  ...,yN)T,  then  the  use  of  beamforming  weights  c  corre¬ 

sponds  to  computing  the  inner  product  (y,  c)  =  cHy  =  YliLi  ciVi ■  With  this  convention,  the 
beamforming  weights  for  directing  a  beam  in  direction  6  are  given  by  c  =  a (0). 

Steering  nulls:  As  we  see  from  Figure  8.16,  when  we  form  a  beam  in  a  given  direction,  we 
maximize  the  beam  pattern  in  that  direction,  creating  a  main  lobe  in  the  beam  pattern,  while 
also  generating  other  local  maxima  (typically  of  lower  strength)  in  other  directions.  The  latter 
are  called  sidelobes,  and  are  often  small  enough  compared  to  the  main  lobe  that  we  do  not  worry 
about  them.  Sometimes,  however,  we  want  to  be  extra  careful  in  guaranteeing  that  power  is  not 
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accidentally  steered  in  an  nndesired  direction.  For  example,  a  cellular  base  station  employing 
a  beamforming  array  to  receive  a  signal  from  mobile  A  may  wish  to  null  out  interference  from 
mobile  B.  We  can  use  a  ZF  approach,  analogous  to  the  one  discussed  in  detail  in  Section  8.2.2. 
If  mobile  A  is  in  direction  0A  and  mobile  B  in  direction  6  b,  then  we  wish  to  align  c  with  a(6A)  as 
best  we  can,  while  staying  orthogonal  to  a {6b)-  Thus,  we  can  choose  the  beamforming  weights 
to  be  a  scaled  version  of  the  projection  of  a(6A)  orthogonal  to  the  interference  subspace  spanned 
by  sl(Ob),  which  is  given  by 


CA  =  a(04 )  -  (a(^),a(gB))  a<fB]  -  (8.54) 

\&{6b),  &{6b)} 

While  the  ZF  approach  has  the  advantage  of  having  a  clear  geometric  interpretation,  in  practice, 
when  implementing  this  at  the  receiver,  we  may  often  employ  the  MMSE  criterion  (see  Sections 
8.2  and  8.2.2),  which  lends  itself  to  adaptive  implementation. 

We  can  combine  beam  and  null  steering  in  this  fashion  at  the  transmitter  as  well  as  the  receiver. 
There  are  some  additional  issues  when  employing  this  approach  at  the  transmitter.  First,  the 
transmitter  must  know  the  array  responses  corresponding  to  the  different  receivers  it  is  steering 
beams  or  nulls  towards,  which  requires  either  explicit  feedback,  or  implicit  feedback  derived 
from  reciprocity.  Second,  we  must  scale  the  weights  appropriately  depending  on  constraints  on 
transmit  power:  average  power  scales  with  ||c|i 2,  while  peak  power  scales  with  maxj|cj|2. 

Space  division  multiple  access  (SDMA ):  Beamforming  and  nullforming  can  enable  a  single  receiver 
to  receive  from  multiple  transmitters,  and  conversely,  a  single  transmitter  to  transmit  separate 
messages  to  different  receivers,  using  a  common  set  of  time-frequency  resources.  This  is  termed 
space  division  multiple  access  (SDMA).  For  example,  in  order  to  send  a  message  signal  sA(t ) 
to  mobile  A  without  interfering  with  mobile  B,  and  message  signal  Ssit)  to  mobile  B  without 
interfering  with  mobile  A,  the  transmitter  sends  the  “space-time”  signal 

y(t)  =  sA(t)c*A  +  sB(t)  c*B  (8.55) 

where  cA  is  the  zero-forcing  solution  in  (8.54),  and  c#  is  a  zero- forcing  solution  with  the  roles 
of  A  and  B  interchanged.  That  is,  the  signal  transmitted  from  the  ith  antenna  is  a  linear 
combination  of  the  two  message  signals,  yi(t )  =  sA{t)c*A  +  ss(t)c*B,  where  the  conjugation  of 
the  beamforming  weights  is  in  accordance  with  the  convention  discussed  earlier.  A  receiver  with 
an  antenna  array  can  use  similar  techniques  to  receive  signals  from  multiple  transmitters  at  the 
same  time.  SDMA  is  explored  further  in  Problem  8.11. 


8.4.3  Rich  Scattering  and  MIMO-OFDM 

While  Section  8.4.2  focuses  on  beamsteering  and  nullsteering  along  specific  directions,  the  channel 
between  transmitter  and  receiver  may  often  be  characterized  by  a  large  number  of  paths,  possibly 
corresponding  to  different  directions  of  arrival  or  departure.  Indoor  WiFi  channels  are  one 
example  of  such  “rich  scattering”  channels.  Figure  8.17  shows  some  of  the  paths  obtained  from 
two-dimensional  ray  tracing  between  a  transmitter  and  a  receiver  in  a  rectangular  room.  These 
include  all  four  first-order  reflections  (single  bounces)  and  two  second-order  reflections  (two 
bounces).  Not  all  of  these  have  equal  attenuation  (the  attenuation  of  a  path  depends  on  its 
length,  as  well  as  the  angles  of  incidence  and  the  type  of  material  at  each  surface  it  reflects  off 
of),  but  we  can  see  from  the  construction  of  the  second-order  reflections  that  the  number  of 
paths  quickly  becomes  large  as  we  start  accounting  for  multiple  bounces.  Of  course,  the  path 
strengths  start  dying  out  as  the  number  of  bounces  increases,  since  there  is  a  loss  in  strength  for 
each  bounce,  but  for  typical  indoor  environments  in  the  WiFi  bands  (2.4  and  5  GHz),  there  are 
many  paths  with  nontrivial  gains. 
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Figure  8.17:  Ray  tracing  to  determine  paths  between  a  transmitter-receiver  inside  a  “two- 
dimensional  room.”  All  first-order  reflections,  and  two  second-order  reflections,  are  shown.  The 
lightly  shaded  circles  depict  “virtual  sources”  employed  to  perform  ray  tracing. 
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Figure  8.18:  A  typical  propagation  environment  between  an  elevated  base  station  and  a  mobile 
in  urban  clutter.  The  mobile  sees  a  rich  scattering  environment  locally,  due  to  reflections  from 
building  and  street  surfaces.  However,  from  the  base  station’s  viewpoint,  the  paths  to  the  mobile 
fall  within  a  narrow  angular  spread. 
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Exercise:  What  is  the  total  number  of  second-order  reflections  in  the  scenario  depicted  in  Figure 
8.17? 

Even  in  outdoor  settings,  such  as  for  cellular  networks,  mobiles  in  an  urban  environment  may 
see  rich  scattering  because  of  bounces  from  buildings  around  them.  An  elevated  base  station, 
however,  may  still  see  a  relatively  sparse  scattering  environment.  Such  a  situation  is  depicted  in 
Figure  8.18.  Since  the  base  station  sees  a  narrow  angular  spread,  it  may  be  able  to  employ  beam¬ 
forming  strategies  effectively  (e.g.,  forming  a  beam  along  the  “mean”  angle  of  arrival/departure). 
However,  the  mobile  transceiver  must  account  for  the  rich  scattering  environment  that  it  sees. 

At  this  point,  the  reader  is  encouraged  to  quickly  review  Section  2.9.  As  we  noted  there,  a 
multipath  channel  has  a  transfer  function  which  is  “frequency-selective”  (i.e. ,  it  varies  with 
frequency).  Now  that  we  have  multiple  antennas,  each  antenna  sees  a  frequency-selective  channel, 
so  that  the  net  array  response  is  frequency-selective.  However,  we  can  model  the  array  response  as 
constant  for  a  small  enough  frequency  slice  (smaller  than  the  coherence  bandwidth-see  discussion 
in  Section  2.9).  OFDM  (see  Section  8.3)  naturally  decomposes  the  channel  into  such  slices,  and 
each  subcarrier  in  a  MIMO-OFDM  system  may  see  a  different  array  response.  Thus,  we  can  apply 
MIMO  processing  in  parallel  to  each  subcarrier  after  downconversion  and  OFDM  processing,  as 
shown  in  Figure  8.19. 
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Figure  8.19:  Typical  MIMO-OFDM  receiver  architecture.  After  downconverting  and  sampling 
the  received  signal  from  each  antenna,  we  apply  OFDM  processing  to  separate  out  the  subcarriers. 
After  the  FFT,  the  samples  for  a  given  subcarrier,  say  k,  from  the  different  antennas  are  collected 
together  for  per-sub carrier  MIMO  processing.  Thus,  each  subcarrier  sees  a  different  narrowband 
MIMO  channel. 


Focusing  on  a  single  subcarrier  in  a  MIMO-OFDM  system  (this  model  also  applies  to  narrowband 
signaling  with  bandwidth  smaller  than  the  channel  coherence  bandwidth),  consider  a  link  with 
M  transmit  antennas  and  N  receive  antennas.  Over  a  subcarrier,  the  channel  from  transmit 
element  m  to  receive  element  n  is  a  complex-valued  scalar,  which  we  denote  by  Hnm.  If  the 
transmitter  sends  a  complex  symbol  x[m]  from  antenna  m,  then  the  nth  receive  antenna  sees  the 
linear  combination 

M 

Vn  ^  ^  HnrnXm  T  Wn  (8.56) 

m—  1 

where  wn  is  the  complex-valued  noise  seen  at  the  nth  receive  antenna.  The  preceding  can  be 
written  in  matrix-vector  notation  as 

y  =  Hx  +  w  (8.57) 

where  y  =  (jq,  is  the  received  vector,  x  =  (aq,  ...,Xm)T  is  the  transmitted  vector,  and  H 

is  the  N  x  M  channel  matrix,  whose  mth  column  is  the  receive  array  response  seen  by  the  mth 
transmit  element. 

Noise  model:  The  complex- valued  noise  wn  is  typically  modeled  as  follows:  Re(wn),  Im(inn)  are 
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i.i.d.  iV(0,a2),  and  are  independent  across  receive  antennas.  The  noise  vector  w  is  said  to  be 
a  complex  Gaussian  random  vector  which  is  completely  characterized  by  its  mean  E[w]  =  0 
and  covariance  matrix  Cw  =  E  [(w  —  E[w])(w  —  E[w])w]  =  2cr2I.  Its  distribution  is  denoted  by 
w  ~  CN(0,  2cr2I),  and  the  distribution  of  any  entry  is  specified  as  wn  ~  CiV(0,  2<r2). 

Remark  on  notation:  According  to  our  convention  (which  is  consistent  with  most  literature  in 
the  field),  for  anMxiV  MIMO  system  (i.e.,  with  M  transmit  antennas  and  N  receive  antennas), 
the  channel  matrix  H  is  an  N  X  M  matrix.  The  reason  for  this  choice  of  convention  is  that  we 
like  working  with  column  vectors:  x  is  the  M  x  1  column  vector  of  symbols  transmitted  from  the 
different  transmit  antennas,  the  mth  column  of  H  is  the  receiver’s  spatial  response  to  the  mth 
transmit  antenna,  and  y  is  the  N  x  1  column  vector  of  received  samples. 

Operations  such  as  beamforming  and  nullforming  can  now  be  performed  separately  for  each 
subcarrier.  However,  these  operations  are  no  longer  associated  with  directing  energy  or  nulls 
towards  particular  physical  directions,  since  the  spatial  response  in  each  subchannel  is  a  linear 
combination  of  array  responses  associated  with  many  directions.  A  particularly  simple  model  for 
the  resulting  channel  gains  for  a  given  subcarrier  is  described  next. 

Rich  scattering  model:  The  path  gains  H(n,  m)  for  a  given  subcarrier  are  a  function  of  the 
channel  impulse  responses  between  each  transmit /receive  pair,  but  are  often  modeled  statistically 
in  order  to  provide  quick  insights  into  design  tradeoffs  in  a  manner  that  is  independent  of  the 
specific  propagation  geometry.  We  now  discuss  a  particularly  simple  model,  motivated  by  “rich 
scattering  environments”  in  which  there  are  a  large  number  of  paths  of  roughly  equal  strength 
between  the  transmitter  and  receiver.  Let  h  =  H(m,n )  denote  the  complex  gain  between  a 
typical  transmit /receive  antenna  pair.  We  can  write 

h  =  J2  A^ej<>i 

i— 1 

where  L  is  the  number  of  paths,  and  where  A;  >  0,  9i  G  [0,  27t]  are  the  amplitude  and  phase  of 
the  complex-valued  path  gain  for  the  given  subcarrier.  We  therefore  have 

L  L 

Re(/i)  =  cos  9i  ,  Irn(ft)  =  Ai  sin  9,t 

i— 1  2—1 

If  the  differences  between  the  lengths  of  the  different  paths  are  comparable  to,  or  larger  than, 
a  carrier  wavelength  (which  is  typically  the  case  even  for  WiFi  links  indoors,  and  certainly  for 
cellular  links  outdoors),  then  we  can  model  the  phases  0*  as  i.i.d.  uniform  over  [0,  27t].  Now,  if 
the  amplitudes  for  the  different  paths  are  roughly  comparable,  then  we  can  apply  the  central 
limit  theorem  to  approximate  the  joint  distribution  of  R e(h)  and  Im(ft)  as  i.i.d.  N(0,  A2/ 2). 

Let  us  now  normalize  Xu=i  A2  =  1  without  loss  of  generality;  we  can  scale  the  noise  variance 
to  adjust  the  average  SNR:  SNR  =  E!^I  -  =  Aj  for  the  model.  We  can  therefore  model  h  as  a 
zero  mean  complex  Gaussian  random  variable:  h  ~  CN(0, 1).  Furthermore,  for  rich  scattering 
environments,  it  is  assumed  that  the  phases  seen  by  different  transmit /receive  antenna  pairs 
are  sufficiently  different  that  we  can  model  the  gains  H(n,m)  as  i.i.d.  CiV(0,l)  for  different 
transmit /receive  antenna  pairs  (m,n). 


8.4.4  Diversity 


When  the  transmitter  and  receiver  each  have  only  one  antenna  (iff 
scattering  model,  the  SNR  is  given  by 


SNR 


\hl 

2a2 


N 


1),  under  the  rich 


(8.58) 
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Since  h  is  a  random  variable,  so  is  the  SNR.  In  fact,  since  Re(/i)  and  Im(/i)  are  i.i.d.  iV(0,|), 
the  sum  of  their  squares  is  an  exponential  random  variable  (see  Problems  5.11  and  5.21).  Taking 
into  account  the  scaling  by  2a2,  we  can  show  that  SNR  is  an  exponential  random  variable  with 
mean  equal  to  the  average  SNR  SNR  =  If  we  now  design  our  coded  modulation  strategy 
for  a  nominal  SNR  of  SNRq,  we  say  that  the  system  is  in  outage  when  the  SNR  is  smaller  than 
this  value.  The  probability  of  outage  is  given  by 

Pout  =  P[SNR  <  SNR0\  =  1  -  e~SNRo/SNR  (8.59) 

We  would  typically  choose  the  nominal  SNR,  SNRq ,  to  be  smaller  than  the  average  SNR,  SNR, 
by  a  link  margin.  For  example,  for  a  link  margin  of  10  dB,  we  have  SNR0  =  0.1  SNR,  so  that 
PoUt  =  1  — e^0'1  ~  0.1  (for  |x|  small,  ex  ~  1+x  for  |x|  small,  so  that  1  —  e~x  ~  x).  Thus,  even  after 
giving  up  10  dB  in  link  margin,  we  still  get  a  relatively  high  outage  rate  of  10%.  Of  course,  there 
is  a  nontrivial  probability  that  the  SNR  with  fading  is  higher  than  the  nominal,  hence  we  can 
have  negative  link  margins  if  we  are  willing  to  live  with  large  enough  outage  rates.  For  example, 
a  link  margin  of  -  3  dB  corresponds  to  SNR0  =  2 SNR,  with  outage  rate  Pout  =  1  —  e”2  =  0.865 
(too  high  for  most  practical  applications). 

In  order  to  reduce  the  outage  rate  without  increasing  the  link  margin,  we  must  employ  diversity, 
which  is  a  generic  term  used  for  any  strategy  that  gets  multiple,  approximately  independent, 
“looks”  at  a  fading  channel.  We  saw  diversity  in  action  for  our  simulation-based  model  in 
Software  Lab  2.2.  We  now  explore  it  for  the  rich  scattering  model,  skipping  some  details  in  the 
derivation  in  the  interest  of  arriving  quickly  at  the  key  insights. 

Benchmark:  We  continue  to  define  our  link  margin  relative  to  an  unfaded  single  input  single 
output  (SISO)  system  with  average  SNR  of  SNR  =  ^ ■ 

Receive  diversity:  Consider  a  receiver  equipped  with  two  antennas  (N  =  2).  If  they  are  spaced 
far  enough  apart  in  a  rich  scattering  environment,  we  can  assume  that  the  channel  gains  (for  a 
given  subcarrier)  seen  by  the  two  antennas  are  i.i.d.  CiV(0, 1)  random  variables.  The  received 
samples  at  the  two  antennas  are  modeled  as 


yn  =  hnx  +  wn  ,  n  —  1,2 

where  x  is  the  transmitted  symbol,  h[l],  h[ 2]  ~  CN( 0, 1)  are  i.i.d.  (independent  Rayleigh  fading), 
and  u>[l],  w[2]  ~  CN( 0,  2a2)  are  i.i.d.  (independent  noise  samples).  The  optimal  decision  statistic 
is  obtained  using  receive  beamforming,  and  is  given  by 

Z  =  h\yi  +  h*2y2  (8.60) 

It  can  be  shown  that  the  SNR  is  now  given  by 

SNR  =  G  ~SNR  (8.61) 


where  the  gain  relative  to  the  benchmark  SISO  system  is  given  by 

G  =  \hi\2  +  |h2|2  (8.62) 

We  can  break  this  up  into  two  gains:  G diversity  =  —  due  to  averaging  channel  fluctuations 
across  antennas,  and  Gcoh  =  2  due  to  averaging  noise  across  antennas  (the  signal  terms  are 
being  combined  coherently,  so  that  the  phases  line  up,  while  the  noise  terms  are  being  combined 
incoherently,  across  the  two  receive  antennas).  Thus, 


G 


G, 


diver  sitySr  coh 


where  Gdiversity  =  |fel|~+|fe2|a  and  Qcoh  =  2. 


(8.63) 
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Equations  (8.61)  and  (8.63)  generalize  directly  to  N  receive  antennas: 


G 

G 


—  \hi\2  +  •••  +  IM2 

_  |/ll|2  +  ...  +  |/ljv| 
diversity  j\j 


2 


G  diversity  G  coh 

Gcoh  =  N 


(8.64) 


As  N  gets  large,  the  fluctuations  due  to  fading  get  smoothed  away,  and  G diversity  — >  1  by  the  law 
of  large  numbers.  In  practice,  however,  even  small  values  of  N  (e.g.,  N  =  2,4)  give  significant 
performance  gains. 


Figure  8.20:  Probability  of  outage  versus  link  margin  (dB)  for  receive  diversity  in  1  x  N  MIMO 
systems. 


Suppose  now  that  we  design  our  coding  and  modulation  so  as  to  provide  reliable  performance  at 
a  nominal  SNR,  say  SNRq ,  which  is  smaller  than  the  SISO  benchmark  SNR  by  a  link  margin 
of  L  dB:  SNR0(dB)  =  SNR(dB)  —  L(dB).  The  probability  of  outage  is  given  by 

Pout  =  P[SNR  <  SNR0]  =  P[G  <  10-^/10]  (8.65) 

Figure  8.20  plots  the  outage  probability  as  a  function  of  link  margin  for  several  different  values 
of  N.  The  plots  are  obtained  using  the  procedure  outlined  in  Problem  8.12. 

Transmit  diversity:  If  the  transmitter  has  multiple  antennas,  it  can  beamform  towards  the 
receiver  if  it  has  implicit  or  explicit  feedback  regarding  the  channel,  as  already  noted.  When 
such  feedback  is  not  available,  we  would  like  to  use  open  loop  strategies  which  provide  diversity. 
Consider  a  transmitter  with  two  antennas  communicating  with  a  receiver  with  a  single  antenna 
(iff  =  2,  N  —  1).  In  a  MIMO-OFDM  system,  for  a  given  subcarrier,  suppose  that  the  transmit 
antenna  1  sends  the  sample  X\  and  transmit  antenna  2  sends  the  sample  x2.  If  the  transmitter 
knows  the  channel  coefficients  hi  and  h2  from  the  two  transmit  antennas  to  the  receive  antenna, 
then  it  could  choose  X\  =  h\x ,  x2  =  h2x:  where  x  is  the  symbol  to  be  transmitted.  What  do  we 
do  when  h\  and  h2  are  unknown?  In  general,  if  we  send  X\,  x2  from  the  two  transmit  elements, 
then  the  received  sample  is  given  by 


y  =  hiXi  +  h2x  2  +  w 


(8.66) 
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where  w  is  noise.  Thus,  if  X\  and  X2  are  two  independent  symbols,  then  they  interfere  with  each 
other  at  the  receiver.  On  the  other  hand,  if  we  set  X\  =  X2  =  x/y/2  (normalizing  the  transmit 
power  across  the  two  antennas  to  that  of  a  transmitter  with  a  single  antenna),  then  we  receive 


V  = 


h\  +  h-2 


X  +  w 


If  hi,  h2  are  i.i.d.  C1V(0, 1),  it  is  easy  to  show  that  the  effective  channel  coefficient  heff  =  l1^ 2 

is  also  CN(0,  1).  Thus,  we  still  have  Rayleigh  fading,  and  have  not  made  any  progress  relative  to 
a  single  antenna  transmitter!  An  ingenious  solution  to  this  problem  is  the  Alamouti  space-time 
code  (named  after  its  inventor),  which  resolves  the  interference  between  the  signals  sent  by  the 
two  transmit  antennas  over  two  time  samples.  Let  s[l]  and  s[2]  be  two  symbols  to  be  transmitted. 
For  a  single  antenna  transmitter,  they  would  be  transmitted  in  sequence.  For  the  two  antenna 
transmitter  now  being  considered,  let  us  expand  the  signal  space  dimension  to  two  at  the  receiver 
by  considering  two  successive  time  samples.  This  allows  us  to  orthogonalize  the  contributions  of 
these  two  symbols  at  the  receiver.  Denoting  by  x*[l]  and  Xi[2],  i  =  1,2  the  samples  transmitted 
from  antenna  i  at  two  successive  time  intervals,  we  set 


xi[l)  =  b[l]/V2,  X2[l]  =  b[2]/y/2 

xi[2]  =  -b*[ 2]/V2,  X2 [2]  =  b*[  1]/V2 


where  we  have  again  normalized  the  net  transmit  power  to  that  of  a  single  antenna  system. 
Figure  8.21  depicts  the  operation  of  the  Alamouti  space-time  code,  taking  a  sequence  of  symbols 
as  input,  and  mapping  them  in  groups  of  two  to  a  sequence  of  samples  at  the  output  of  each 
antenna. 


b[l],  b[2],  b[3],  b[4],... 


Alamouti 

space-time 

code 


b[l],-b*[2],b[3],-b*[4],... 


b[2],  b*[l],  b[4],  b*[3],... 


Figure  8.21:  The  transmitter  in  an  Alamouti  space-time  code  takes  two  symbols  at  a  time,  and 
maps  them  to  two  consecutive  symbols  to  be  sent  from  each  of  the  two  transmit  antennas.  The 
input  to  the  space-time  encoder  is  the  sequence  of  symbols  to  be  transmitted,  {6[n]},  while  the 
outputs  are  the  sequences  {xj[ra]},  i  =  1,2,  to  be  transmitted  from  antenna  i.  The  l/y/2  factor 
for  power  normalization  is  omitted  from  the  figure. 


The  received  samples  in  the  two  successive  time  intervals  are  given  by 

V[  1]  =  hix^l]  +  h2x2[  1]  +  w[l]  =  ±feb[  1]  +  ^b[  2]  +  w[l] 

y[  2]  =  hlXl[2]  +  h2x2[  2]  +  w[2]  =  -*feb*[  2}  +  ^b*[  1]  +  w[2]  1  '  j 

We  assume  that  the  receiver  has  estimates  of  the  channel  coefficients  hi  and  /12  (e.g.,  using  known 
training  signals).  We  would  like  to  write  the  two  observations  as  a  received  vector  in  which  each 
symbol  modulates  a  different  signal  vector.  Since  the  symbols  are  conjugated  when  sent  over  the 
second  time  interval,  we  conjugate  the  second  received  sample  when  creating  the  received  vector. 
This  yields  the  following  vector  model: 

y  =  (  yip]  )  =  *>[1]  ^  ]|  j  +  b[ 2]  f  j  +  (  yj*[2]  )  =  b^Ul  +  b^U2  +  w  ^8-69) 
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The  vectors  tp  =  h2)T  and  u2  =  ^=(/i2,  —  h*l)T  are  orthogonal  (i.e.,  u^u2  =  0),  regardless 

of  the  values  of  the  channel  coefficients,  hence  the  symbols  b[  1]  and  6 [2]  do  not  interfere  with 
each  other.  The  vector  w  ~  CN(0,  2<j2I).  The  optimal  decision  statistic  Z[i\  for  symbol  b[i)  is 
given  by  matched  filtering  against  up 

Z[i\  =  ufy,  i  =  1,2  (8.70) 

Exercise:  Write  out  these  decision  statistics  explicitly  in  terms  of  y[l],  y[ 2],  h\ ,  h2- 
Answer:  Z[  1]  =  h\y[  1]  +  h2y*[2],  Z[ 2]  =  h2y[  1]  —  h\y*[2]  (up  to  scale). 

The  SNR  seen  by  each  symbol  is  given  by 


SNR 


Alamouti 


(i^r+i^n^ 

2a2 


G 


Alamouti 


SNR 


(8.71) 


where 


G 


Alamouti 


N2  +  N2 

2 


(8.72) 


Comparing  with  (8.64)  for  receive  diversity,  we  see  that  the  Alamouti  scheme  in  a  2  x  1  system 
achieves  the  same  diversity  gain  as  receive  diversity  in  a  1  x  2  system,  but  does  not  provide  the 
coherent  gain  obtained  from  averaging  across  receive  antennas  in  the  latter.  Of  course,  as  we  see 
in  Problem  8.13  and  in  Software  Lab  8.3,  the  Alamouti  scheme  applies  to  2  x  N  MIMO  systems 
for  arbitrary  N,  so  that  we  can  get  noise  averaging  and  receive  diversity  gains  for  N  >  1.  The 
outage  rates  computed  in  Problem  8.13  are  plotted  in  Figure  8.22. 


Figure  8.22:  Probability  of  outage  versus  link  margin  (dB)  for  Alamouti  space-time  coding  for 
2  x  N  MIMO  with  rich  scattering. 


The  simplicity  of  the  Alamouti  construction  (and  its  optimality  for  2x1  MIMO)  has  led  to  its 
adoption  by  a  number  of  cellular  and  WiFi  standards  (just  do  an  Internet  search  to  see  this). 
Unfortunately,  the  orthogonalization  provided  by  the  Alamouti  space-time  code  does  not  scale 
to  more  than  two  transmit  antennas.  There  are  a  number  of  “quasi-orthogonal”  constructions 
that  have  been  investigated,  but  so  far,  these  have  had  less  impact  on  practice.  Indeed,  when 
there  are  a  large  number  of  transmit  antennas,  the  trend  is  to  engineer  the  system  so  that 
the  transmitter  has  enough  information  about  the  channel  to  perform  some  form  of  transmit 
beamforming  (possibly  using  multiple  beams). 
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8.4.5  Spatial  multiplexing 


We  have  already  seen  that  a  transceiver  with  multiple  antennas  can  use  SDMA  to  communicate 
with  multiple  nodes  at  different  locations.  For  example,  a  cellular  base  station  with  multiple 
antennas  can  use  the  same  time-frequency  resources  to  send  data  streams  in  parallel  to  different 
mobile  devices  (even  if  such  devices  only  have  one  antenna  each).  When  both  transmitter  and 
receiver  have  multiple  antennas,  if  the  propagation  environment  is  “rich  enough,”  then  multiple 
parallel  data  streams  can  be  sent  between  transmitter  and  receiver.  This  is  termed  spatial 
multiplexing.  Figure  8.23  depicts  spatial  multiplexing  with  two  antennas,  modeling,  for  example, 
one  subcarrier  in  a  MIMO-OFDM  system.  The  per-stream  symbol  rate  1/T  is  the  rate  of  sending 
symbols  along  a  subcarrier,  where  T  is  the  length  of  an  OFDM  symbol.  With  M-fold  spatial 
multiplexing,  the  aggregate  symbol  rate  for  a  given  subcarrier  is  M/T.  This  should  be  scaled  up 
by  the  number  of  subcarriers  to  get  the  overall  symbol  rate. 


b[l],  b[2],  b[3],  b [4] , . . . 


aggregate  symbol  rate  2/T 


b[l],b[3],... 

b[2],b[4],... 


per-stream  symbol  rate  1/T 


Figure  8.23:  For  spatial  multiplexing,  the  transmitter  may  take  a  sequence  of  incoming  symbols 
{6[n]},  and  do  a  serial-to-parallcl  conversion  to  map  them  to  subsequences  to  be  transmitted  from 
the  different  antennas.  In  the  example  shown,  the  odd  symbols  are  transmitted  from  antenna  1, 
and  the  even  symbols  from  antenna  2.  The  aggregate  symbol  rate  is  twice  the  per-stream  symbol 
rate. 


For  example,  suppose  that  the  transmitter  and  receiver  in  a  MIMO-OFDM  system  each  have 
two  antennas  (M  =  N  =  2).  For  a  given  subcarrier  in  an  OFDM  system,  consider  a  particular 
time  interval.  Suppose  that  the  transmitter  sends  X\  from  antenna  1  and  x 2  from  antenna  2 
(referring  to  Figure  8.23,  x±  =  6[1]  and  X2  =  b[ 2]  in  the  first  time  interval).  The  samples  at  the 
two  receive  elements  are  given  by 


yi  =  Hnxi  +  H 12X2  +  w  1 
y2  =  H2iXi  +  H22X2  +  w2 


which  we  can  write  in  vector  form  as 


y  = 


=  Xi 


H 11 
H2i 


xiui+x2u2  +  w  (8.73) 


where  Ui  is  the  response  seen  by  transmit  element  1  at  the  two  receive  antennas,  U2  is  the 
response  seen  by  transmit  element  2  at  the  receive  antennas,  and  w  ~  CN(0,  2a2)  is  complex 
WGN.  While  we  have  considered  a  2  x  2  MIMO  system  for  illustration,  the  model  is  generally 
applicable  for  2  x  N  MIMO  systems  with  N  >  2,  with  Ux  and  u2  denoting  the  two  columns 
of  the  channel  matrix  H.  corresponding  to  the  received  responses  for  each  of  the  two  transmit 
antennas,  respectively. 

In  a  MIMO-OFDM  system,  we  have  eliminated  interference  across  subcarriers  using  OFDM,  but 
we  have  introduced  interference  in  space  by  sending  multiple  symbols  from  different  antennas. 
The  vector  spatial  interference  model  (8.73)  for  each  subcarrier  is  analogous  to  the  vector  time 
domain  interference  model  for  ISI  in  singlecarrier  systems  discussed  in  Chapter  6.  Just  as  we  can 
compensate  for  ISI  using  a  time-domain  equalizer  if  the  time  domain  channel  has  appropriate 
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characteristics,  we  can  compensate  for  spatial  interference  using  a  spatial  equalizer  if  the  spatial 
channel  has  appropriate  characteristics.  For  example,  if  ui  and  112  are  linearly  independent, 
then  we  can  use  linear  ZF  or  MMSE  techniques  to  demodulate  the  symbols  X\  and  x^-  Thus, 
if  there  are  M  parallel  data  streams  being  sent  from  the  transmit  antennas,  then  we  need  at 
least  M  receive  antennas  in  order  to  obtain  a  signal  space  of  large  enough  dimension  for  the 
linear  independence  condition  to  be  satisfied.  Indeed,  it  can  be  shown  more  generally  (without 
restricting  ourselves  to  ZF  or  MMSE  techniques)  that,  for  rich  scattering  models,  the  capacity 
of  an  M  x  N  MIMO  channel  scales  as  min(M,  N),  the  minimum  of  the  number  of  transmit  and 
receive  antennas. 

The  ZF  and  MMSE  receivers  have  been  discussed  in  detail  Section  8.2.2.  For  our  purpose,  the 
relevant  expressions  are  those  for  complex- valued  signals  in  (8.35).  We  reproduce  the  expression 
for  ZF  correlator  here  before  adapting  it  to  our  present  purpose. 

czf  =  C~1XJ  (U^C^U)-1  e 

Recall  that  U  is  a  matrix  containing  the  signal  vectors  as  columns,  and  that  e  is  a  unit  vector 
with  nonzero  entry  corresponding  to  the  desired  vector  Uo  in  the  ISI  vector  model  (8.13).  In 
our  spatial  multiplexing  model  (8.73),  U  =  H  (the  signal  vectors  are  simply  the  columns  of 
the  channel  matrix),  the  noise  covariance  Cw  =  2cr2I.  and  we  wish  to  demodulate  the  data 
corresponding  to  both  of  the  signal  vectors.  Letting  ei  =  (1,  0)T  and  e2  =  (0, 1)T,  the  ZF 
correlators  for  the  two  streams  can  be  written  as  (dropping  scale  factors  corresponding  to  the 
noise  variance) 

Cl  =  H(H"H)_1ei,  c2  =  H  (HfiH)_1  e2 

We  can  represent  this  compactly  as  a  single  ZF  matrix  C zf  =  [C1C2]  containing  these  correlators 
are  columns.  Noting  that  [e]e2]  =  I,  we  obtain  that 

C zf  —  H  (H72H)  1 ,  ZF  matrix  for  spatial  demultiplexing  (8.74) 

The  decision  statistics  for  the  multiplexed  streams  are  given  by 

Z  =  CfFy  (8.75) 

While  we  have  focused  on  the  2x2  example  (8.73)  in  this  derivation,  it  applies  in  general  to  M 
spatially  multiplexed  streams  in  an  M  x  N  MIMO  system  with  N  >  M.  The  outage  rate  with 
zero-forcing  reception  for  2x2  and  2  x  4  is  plotted  in  Figure  8.24,  using  software  developed  in 
Problem  8.14. 

The  MMSE  receiver  can  be  similarly  derived  to  be 

Cmmse  —  (HHj/  +  2<j2I)  1  H,  MMSE  matrix  for  spatial  demultiplexing  (8.76) 

where  we  have  normalized  the  transmitted  symbols  to  unit  energy  (E  [|6[n]  J]  =  a2  =  1). 

It  is  interesting  to  compare  the  spatial  multiplexing  model  (8.73)  with  the  diversity  model  (8.69) 
for  the  Alamouti  space-time  code.  The  Alamouti  code  does  not  rely  on  the  receiver  having 
multiple  antennas,  and  therefore  uses  time  to  create  enough  dimensions  for  two  symbols  to  be 
sent  in  parallel.  Furthermore,  the  vectors  ux  and  u2  in  the  Alamouti  model  (8.69)  are  constructed 
such  that  they  are  orthogonal  regardless  of  the  propagation  channel.  In  contrast,  the  spatial 
multiplexing  model  (8.73)  relies  on  nature  to  provide  vectors  Ux  and  u2  that  are  “different 
enough”  to  support  two  parallel  symbols.  We  explore  these  differences  in  Software  Lab  8.3.  The 
spectral  efficiency  of  spatial  multiplexing  is  twice  that  of  the  Alamouti  code,  but  the  diversity 
gain  that  it  sees  is  smaller,  as  is  evident  from  a  comparison  of  the  outage  rate  versus  link  margin 
curves  in  Figures  8.22  and  8.24.  It  is  possible  to  systematically  quantify  the  tradeoff  between 
diversity  and  multiplexing,  but  this  is  beyond  our  scope  here. 
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Figure  8.24:  Outage  rate  versus  link  margin  (with  respect  to  the  SISO  benchmark)  for  2  x  N 
spatial  multiplexing  (N  =  2, 4)  with  zero-forcing  reception. 


8.5  Concept  Summary 


This  chapter  begins  with  an  introduction  to  modeling  and  equalization  for  communication  over 
dispersive  channels,  including  singlecarrier  and  OFDM  modulation.  All  models  and  algorithms 
are  developed  in  complex  baseband,  so  that  upconversion  and  downconversion  are  not  explicitly 
modeled. 

Modeling  of  singlecarrier  systems 

•  Symbols  in  a  linearly  modulated  system  pass  through  a  cascade  of  the  transmit,  channel  and 
receive  filters,  where  the  cascade  typically  does  not  satisfy  the  Nyquist  criterion  for  ISI  avoidance. 
Any  technique  for  handling  the  resulting  ISI  is  termed  equalization. 

•  Receiver  noise  is  modeled  as  AWGN  passed  through  the  receive  filter. 

•  Eye  diagrams  enable  visualization  of  the  effect  of  various  ISI  patterns,  and  equalization  tech¬ 
niques  are  needed  for  reliable  demodulation  if  the  eye  is  closed. 

Linear  equalization 

•  The  decision  statistic  for  a  given  symbol  is  computed  by  a  linear  operation  on  a  vector  of 
received  samples  in  an  observation  interval  that  is  typically  chosen  to  be  large  enough  that  it 
contains  a  significant  contribution  from  the  symbol  of  interest.  Observation  intervals  for  succes¬ 
sive  symbols  are  offset  by  the  symbol  time,  so  that  the  statistics  of  the  desired  symbol  and  the 
ISI  are  identical  across  observation  intervals. 

•  The  linear  MMSE  equalizer  minimizes  the  MSE  between  the  decision  statistic  and  the  desired 
symbol,  and  also  maximizes  the  SINR  over  the  class  of  linear  equalizers. 

•  The  MSE  and  the  MMSE  equalizer  can  be  expressed  in  terms  of  statistical  averages,  hence 
the  MMSE  equalizer  can  be  computed  adaptively  by  replacing  statistical  averages  by  empirical 
averages.  Such  adaptive  implementation  requires  transmission  of  a  known  training  sequence. 

•  The  received  vector  over  an  observation  interval  is  the  sum  of  the  desired  symbol  modulating 
a  desired  vector,  interfering  symbols  modulating  interference  vectors,  plus  a  noise  vector.  This 
vector  ISI  model  can  be  characterized  completely  if  we  know  the  transmit  filter,  the  channel 
filter,  the  receive  filter,  and  the  noise  statistics  at  the  input  to  the  receive  filter. 
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•  Explicit  analytical  formulas  can  be  given  for  the  ZF  and  MMSE  equalizer,  and  the  associated 
SINRs,  once  the  vector  ISI  model  is  specified. 

•  At  high  SNR,  the  MMSE  equalizer  converges  to  the  ZF  equalizer,  which  (for  white  noise) 
can  be  interpreted  geometrically  as  projecting  the  received  vector  orthogonal  to  the  interference 
subspace  spanned  by  the  interference  vectors,  thus  nulling  out  the  ISI  while  incurring  noise  en¬ 
hancement.  The  ZF  equalizer  exists  only  if  the  desired  vector  is  linearly  independent  of  the 
interference  vectors. 

•  The  geometric  interpretation  and  analytical  formulas  for  the  ZF  and  MMSE  equalizers  devel¬ 
oped  for  white  noise  can  be  extended  to  colored  noise,  with  the  derivation  using  the  concept  of 
noise  whitening. 

OFDM 

•  Since  complex  exponentials  are  eigenfunctions  of  any  LTI  system,  multiple  complex  expo¬ 
nentials,  each  modulated  by  a  complex-valued  symbol,  do  not  interfere  with  each  other  when 
transmitted  through  a  dispersive  channel.  Each  complex  exponential  simply  gets  scaled  by  the 
channel  transfer  function  at  that  frequency.  The  task  of  equalization  corresponds  to  undoing  this 
complex  gain  in  parallel  for  each  complex  exponential.  This  is  the  conceptual  basis  for  OFDM, 
which  enables  parallelization  of  the  task  of  equalization  even  for  very  complicated  channels  by 
transmitting  along  subcarriers. 

•  OFDM  can  be  implemented  efficiently  in  DSP  using  an  IDFT  at  the  transmitter  (frequency 
domain  symbols  to  time  domain  samples)  and  a  DFT  at  the  receiver  (time  domain  samples  to 
frequency  domain  observations,  which  are  the  symbols  scaled  by  channel  gain  and  corrupted  by 
noise) . 

•  In  order  to  maintain  orthogonality  across  subcarriers  (required  for  parallelization  of  the  task 
of  equalization)  when  we  take  the  DFT  at  the  receiver,  the  effect  of  the  channel  on  the  trans¬ 
mitted  samples  must  be  that  of  a  circular  convolution.  Since  the  physical  channel  corresponds 
to  linear  convolution,  OFDM  systems  emulate  circular  convolution  by  inserting  a  cyclic  prefix  in 
the  transmitted  time  domain  samples. 

The  second  part  of  the  chapter  provides  an  initial  exposure  to  how  multiple  antennas  at  the 
transmitter  and  receiver  (i.e.,  MIMO,  or  space-time  techniques)  can  be  employed  to  enhance 
the  performance  of  wireless  systems.  Three  key  techniques,  which  in  practice  are  combined  in 
various  ways,  are  beamforming,  diversity  and  spatial  multiplexing. 

Beamforming  and  Nullforming 

•  The  array  response  for  a  linear  array  can  be  viewed  as  a  mapping  from  the  angle  of  ar¬ 
rival/departure  to  a  spatial  frequency. 

•  For  an  Welcment  array,  spatial  matched  filter,  or  beamforming,  leads  to  a  factor  of  N  gain  in 
SNR.  For  transmit  beamforming  in  which  each  antenna  element  is  transmitting  at  a  fixed  power, 
we  obtain  an  additional  power  combining  gain  of  N. 

•  By  forming  a  beam  at  a  desired  transceiver  and  nulls  at  other  transceivers,  an  antenna  array 
can  support  SDMA. 

MIMO-OFDM  abstraction 

•  Decomposing  a  time  domain  channel  into  subcarriers  using  OFDM  allows  a  simple  model  for 
MIMO  systems,  in  which  the  channel  between  each  pair  of  transmit  and  receive  antennas  is 
modeled  as  a  single  complex  gain  for  each  subcarrier. 

•  When  the  propagation  environment  is  complex  enough,  the  central  limit  theorem  motivates 
modeling  the  channel  gains  between  transmit-receive  antenna  pairs  as  i.i.d.  zero  mean  complex 
Gaussian  random  variables.  We  term  this  rich  scattering  model. 

•  Under  the  rich  scattering  model,  each  transmit-receive  antenna  pair  sees  Rayleigh  fading,  but 
performance  degradation  due  to  fading  can  be  alleviated  using  diversity. 

Diversity 

•  Diversity  strategies  average  over  fades  by  exploiting  roughly  independent  looks  at  the  channel. 

•  Receive  spatial  diversity  using  spatial  matched  filtering  provides  a  channel  averaging  gain  (av- 
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eraging  the  fading  gains  across  antennas)  and  a  noise  averaging  gain  (due  to  coherent  combining 
across  antennas). 

•  Transmit  spatial  diversity  provides  channel  averaging  gains  alone.  It  is  trickier  than  receive  di¬ 
versity,  since  samples  transmitted  from  different  transmit  antennas  can  interfere  at  the  receiver. 
For  two  transmit  antennas,  the  Alamouti  space-time  code  is  an  optimal  scheme  for  avoiding 
interference  between  different  transmitted  symbols,  while  providing  channel  averaging  gains. 

Spatial  multiplexing 

•  Sending  parallel  data  streams  from  different  antennas  increases  the  symbol  rate  proportional 
to  the  number  of  data  streams,  with  space  playing  a  role  analogous  to  bandwidth. 

•  The  parallel  data  streams  interfere  at  the  receiver,  but  can  be  demodulated  using  spatial 
equalization  techniques  analogous  to  the  time  domain  equalization  techniques  studied  in  Section 
8.2  (e.g.,  suboptimal  techniques  such  as  ZF  and  MMSE). 


8.6  Endnotes 


While  we  have  shown  that  ISI  in  singlecarrier  systems  can  be  handled  using  linear  equalization, 
significant  performance  improvements  can  be  obtained  using  nonlinear  strategies,  including  op¬ 
timal  maximum  likelihood  sequence  estimation  (MLSE),  whose  complexity  is  often  prohibitive 
for  long  channels  and/or  large  constellations,  as  well  as  suboptimal  strategies  such  as  decision 
feedback  equalization  (DFE),  whose  complexity  is  comparable  to  that  of  linear  equalization.  An 
introduction  to  such  strategies,  as  well  as  pointers  for  further  reading,  can  be  found  in  more 
advanced  communication  theory  texts  such  as  [7,  8] . 

OFDM  has  now  become  ubiquitous  in  both  wireless  and  wireline  communication  systems  in  recent 
years,  because  it  provides  a  standardized  mechanism  for  parallelizing  equalization  of  arbitrarily 
complicated  channels  in  a  way  that  leverages  the  dropping  cost  and  increasing  speed  of  digital 
computation.  For  more  detail  than  we  have  presented  here,  we  refer  to  the  relevant  chapters  in 
books  on  wireless  communication  by  Goldsmith  [46]  and  Tse  and  Viswanath  [47].  These  should 
provide  the  background  required  to  access  the  huge  research  literature  on  OFDM,  which  focuses 
on  issues  such  as  channel  estimation,  synchronization  and  reduction  of  PAPR. 

There  has  been  an  explosion  of  research  and  development  activity  in  MIMO,  or  space-time 
communication,  starting  from  the  1990s:  this  is  the  decade  in  which  the  large  capacity  gains 
provided  by  spatial  multiplexing  were  pointed  out  by  Foschini  [48]  and  Telatar  [49],  and  the 
Alamouti  space-time  code  was  published  by  Alamouti  [50].  MIMO  techniques  have  been  in¬ 
corporated  into  3G  and  4G  (WiMax  and  LTE)  cellular  standards,  and  WiFi  (IEEE  802. lln) 
standards.  An  excellent  reference  for  exploring  MIMO-OFDM  further  is  the  textbook  by  Tse 
and  Viswanath  [47],  while  a  brief  introduction  is  provided  in  Chapter  8  of  Madhow  [7].  Other 
books  devoted  to  MIMO  include  Paulraj  et  al  [51],  Jafarkhani  [52],  and  the  compilation  edited 
by  Bolcskei  et  al  [53]. 

As  discussed  in  the  epilogue,  a  new  frontier  in  MIMO  is  opening  up  with  research  and  development 
for  wireless  communication  systems  at  higher  carrier  frequencies,  starting  with  the  “millimeter 
wave”  band  (i.e.,  carrier  frequencies  in  the  range  30-300  GHz,  for  which  the  wavelength  is  in  the 
range  1-10  mm).  Of  particular  interest  is  the  60  GHz  band,  where  there  is  a  huge  amount  (7 
GHz!)  of  unlicensed  spectrum,  in  contrast  to  the  crowding  in  existing  cellular  and  WiFi  bands. 
While  fundamental  MIMO  concepts  such  as  beamforming,  diversity,  and  spatial  multiplexing 
still  apply,  the  order  of  magnitude  smaller  carrier  wavelength  and  the  order  of  magnitude  larger 
bandwidth  requires  fundamentally  rethinking  many  aspects  of  link  and  network  design,  as  we 
briefly  indicate  in  the  epilogue. 
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8.7  Problems 


ZF  and  MMSE  equalization:  modeling  and  numerical  computations 

Problem  8.1  (Noise  enhancement  computations)  Consider  the  ISI  vector  models  in  (8.10), 
(8.12)  and  (8.11). 

(a)  Compute  the  noise  enhancements  (dB)  for  the  three  equalizer  lengths  in  these  models,  as¬ 
suming  white  noise. 

(b)  Now,  assume  that  the  noise  w[n]  is  colored,  with  Cw  specified  as  follows: 

(  o-2,  i  =  j 

Cw(i,j)  =  <  |i-j|  =  l  (8.77) 

[  0,  else 

Compute  the  noise  enhancements  for  the  three  equalizer  lengths  considered,  and  compare  with 
your  results  in  (a). 

Problem  8.2  (Noise  enhancement  as  a  function  of  correlator  length)  Now,  consider  the 
discrete  time  channel  model  leading  to  ISI  vector  models  in  (8.10),  (8.12)  and  (8.11). 

(a)  Assuming  white  noise,  compute  and  plot  the  noise  enhancement  (dB)  as  a  function  of  equalizer 
length,  for  L  ranging  from  4  to  16,  increasing  the  observation  interval  by  two  by  adding  one  sample 
to  each  side  of  the  current  observation  interval,  and  starting  from  an  observation  interval  of  length 
L  =  4  lined  up  with  the  impulse  response  for  the  desired  symbol.  Does  the  noise  enhancement 
decrease  monotonically?  Does  it  plateau?  (b)  Repeat  for  colored  noise  as  in  Problem  8.1(b). 

Problem  8.3  (MMSE  correlator  and  SINR  computations)  Consider  the  ISI  vector  model 

(8.10). 

(a)  Assume  Cw  =  cr2I,  and  define  SNR  =  ^°J  as  the  MF  bound  on  achievable  SNR.  For 
SNR  of  6  dB,  compute  the  MMSE  correlator  and  the  corresponding  SINR  (dB),  using  (8.34) 
and  (8.16).  Check  that  the  results  match  the  alternative  formula  (8.36).  Compare  with  the  SNR 
achieved  by  the  ZF  correlator. 

(b)  Plot  the  SINR  (dB)  of  the  MMSE  and  ZF  correlators  as  a  function  of  the  MF  SNR  (dB). 
Comment  on  any  trends  that  you  notice. 


Transmit  Channel 

filter  filter 


Receive 

filter 


Figure  8.25:  Continuous  time  model  for  a  link  with  ISI. 


Problem  8.4  (Prom  continuous  time  to  discrete  time  vector  ISI  model)  In  this  problem, 
we  discuss  an  example  of  how  to  derive  the  vector  ISI  model  (8.13)  from  a  continuous  time  model, 
using  the  system  shown  in  Figure  8.25.  The  symbol  rate  1/T  =  1,  the  input  to  the  transmit 
filter  is  J2nb[n\5(t  —  nT),  where  b[n]  G  (  —  1, 1}.  Thus,  the  continuous  time  noiseless  signal  at 
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the  output  of  the  receive  filter  is  ^2nb[n]q(t  —  ^T),  where  q(t)  =  (grx  *  gc  *  gRx)(t )  is  the 
system  response  to  a  single  symbol.  The  noise  n(t)  at  the  input  to  the  receive  filter  is  WGN 
with  PSD  a2  =  so  that  the  noise  w(t)  =  (n*  gRx)(t)  at  the  output  of  the  receive  filter,  using 
the  material  in  Section  5.9,  is  zero  mean,  WSS,  Gaussian  with  autocorrelation/autocovariance 
function  Rw(r)  =  Cw(t)  =  a2(gRX  *  gRX,mf)(r). 

(a)  Sketch  the  end-to-end  response  q{t).  Compute  the  energy  per  bit  Eb  =  ||g||2. 

Remark:  Note  that  jk  =  ^5-.  If  we  fix  the  signal  scaling,  and  hence  no'll2,  then  the  value  of  a 2 
is  fixed  once  we  specify  Eb/N0. 

(b)  Assume  that  the  sampler  operates  at  rate  2/T  =  2,  taking  samples  at  times  t  =  m/2  for 
integer  m.  Show  that  the  discrete  time  end-to-end  response  to  a  single  symbol  (i.e.  the  sampled 
version  of  q(t))  is 

h=  (-,0,1, 2, -1,-1,  A, 0,...) 

(c)  For  the  given  sampling  rate,  show  that  a  vector  ISI  model  (8.13),  the  noise  covariance  matrix 
satisfies  (8.77). 

(d)  Specify  the  matrix  U  corresponding  to  the  ISI  model  (8.13)  for  a  linear  equalizer  of  length 
5,  with  observation  interval  lined  up  with  the  channel  response  for  the  desired  symbol. 

(e)  Taking  into  account  the  noise  coloring,  compute  the  optimal  ZF  correlator,  and  its  noise 
enhancement  relative  to  the  matched  filter  bound. 

(f)  Compute  the  MMSE  correlator  for  of  10  dB  (see  (a)  and  the  associated  remark).  What 
is  the  output  SINR,  and  how  does  it  compare  with  the  SNR  of  the  ZF  correlator  in  (e)? 

ZF  and  MMSE  equalization:  theoretical  derivations 

Problem  8.5  (ZF  geometry)  For  white  noise,  the  output  of  a  ZF  correlator  satisfying  (8.17) 
and  (8.18)  is  given  by 

cTr[n]  =  b[n]  +  N( 0,  cr2|  |c|  |2) 

Since  the  signal  scaling  is  fixed,  the  optimal  ZF  correlator  is  one  that  minimizes  the  noise  variance 
cr2 1 |c| |2.  Thus,  the  optimal  ZF  correlator  minimizes  ||c||2  subject  to  (8.17)  and  (8.18). 

(a)  Suppose  a  correlator  Ci  satisfies  (8.17)  and  (8.18),  and  is  a  linear  combination  of  the  signal 
vectors  {u^}.  Now,  suppose  that  we  add  a  component  Ac  orthogonal  to  the  space  spanned  by 
the  signal  vectors.  Show  that  c2  =  Ci  +  Ac  is  also  a  ZF  correlator. 

(b)  How  is  the  output  noise  variance  for  c2  related  to  that  for  ci? 

(c)  Conclude  that,  in  order  to  be  optimal,  a  ZF  correlator  must  lie  in  the  signal  subspace  spanned 
by  {ufc}. 

(d)  Observe  that  the  condition  (8.17)  implies  that  c  must  be  orthogonal  to  the  interference 
subspace  spanned  by  {ufc,  k  7^  0}.  Combining  with  (c),  infer  that  c  must  be  a  scalar  multiple  of 

P/Uo- 


Problem  8.6  (invertibility  requirement  for  computing  ZF  correlator)  The  ZF  correlator 
expression  (8.22)  requires  inversion  of  the  matrix  U2  U  of  correlations  among  the  signal  vectors, 
with  (i,j)th  entry  u^u,-)  =  ufu j. 

(a)  Show  that 


UrUa  =  0 


(8.78) 


if  and  only 

Ua  =  0 


(8.79) 


Hint:  In  order  to  show  that  (8.78)  implies  (8.79),  suppose  that  (8.78)  holds.  Multiply  by  aT  and 
show  that  you  get  an  expression  of  the  form  x2x  =  ||x||2  =  0,  which  implies  x  =  0. 

(b)  Use  the  result  in  (a)  to  infer  that  UTU  is  invertible  if  and  only  if  the  signal  vectors  {u*,}  are 
linearly  independent. 
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Problem  8.7  (Alternative  computation  of  the  ZF  correlator)  An  alternative  computa¬ 
tion  for  the  ZF  correlator  is  by  developing  an  expression  for  Pfuo  in  terms  of  Uo  and  U7,  a 
matrix  containing  the  interference  vectors  {u*.,  /c  f  0}  as  columns.  That  is,  U7  is  obtained  from 
the  signal  matrix  U  by  deleting  the  column  corresponding  to  the  desired  vector  uo-  Let  us  de¬ 
fine  the  projection  of  uo  onto  the  interference  subspace  by  P/Uo.  By  definition,  this  is  a  linear 
combination  of  the  interference  vectors,  and  can  be  written  as 


P7u0  =  U7a7  (8.80) 

The  orthogonal  projection  P^uo  is  therefore  given  by 

P/U0  =  u0  -  P/u0  =  u0  -  U7a7  (8.81) 

(a)  Note  that  bPperp u0  must  be  orthogonal  to  each  of  the  interference  vectors  {u^,  k  f  0},  hence 
(going  directly  to  the  general  complex-valued  setting) 

Uf  Pf  u0  =  0 


(b)  Infer  from  (a)  that 

a7  =  (UfU7)_1Ufu0 

(c)  Derive  the  following  explicit  expression  for  the  orthogonal  projection 


P^u0  =  u0  -  U7  (Uf  U7)  1  Uf  u0 


(8.82) 


(d)  Derive  the  following  explicit  expression  for  the  energies  of  the  projection  onto  the  interference 
subspace  and  the  orthogonal  projection: 


P7u0||2  =  uf U7  (Uf  U7)  1  Ufn0 

Pfuoll2  =  ||u0||2  -  ufU7  (UfU7)_1  Ufu0  =  uf  (i  -  U7  (UfU7)_1  Uf)  u0 


(8.83) 


(e)  Note  that  (8.82)  and  (8.83),  together  with  (8.27),  give  us  an  expression  for  a  ZF  correlator 
cZf  scaled  such  that  (czf,  u0)  =  1. 


Problem  8.8  (Analytical  expression  for  MMSE  correlator)  Let  us  derive  the  expression 
(8.34)  for  the  MMSE  correlator  for  the  vector  ISI  model  (8.13).  We  consider  the  general  scenario 
of  complex- valued  symbols  and  signals.  Suppose  that  the  symbols  {5[n]}  in  the  model  are 
uncorrelated,  satisfying 

E [%]&>]]={  °’2  2-1  (8-84) 

f  °b1  "L  —  u 

We  have 


R  =  E[r[n]rH  [n]] 


E 


(^b[n  +  k]uk 

k 


+  w[n])(^b[n  +  l}\ii  +w  [n\)H 

i 


p  =  E[6*[n]r[n]]  =  E  b*[n\(^^b[n  +  k]uk  +  w[n]) 

k 

Now  use  (8.84),  and  the  independence  of  the  symbols  and  the  noise,  to  infer  that 

R  =  ol  ^  UkUk  +  ,  P  =  CTfoUo 
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Problem  8.9  (ZF  correlator  for  colored  noise)  Consider  the  model  (8.13)  where  the  noise 
covariance  is  a  positive  definite  matrix  Cw.  We  now  derive  the  formula  (8.30)  for  the  ZF  corre¬ 
lator,  by  mapping,  via  a  linear  transformation,  the  system  to  a  white  noise  setting  for  which  we 
have  already  derived  the  ZF  correlator  in  (8.22).  Specifically,  suppose  that  we  apply  an  invertible 
matrix  A  to  the  received  vector  r[n],  then  we  obtain  a  transformed  received  vector 


r[n]  =  Ar[n]  =  b[n  +  k]uk  +  w[n]  (8.85) 

k 

where 

u  k  =  Aufc  (8.86) 

w[n]  =  Aw[n]  ~  A(0,  ACWAT)  (8.87) 

(a)  Suppose  that  we  find  a  linear  correlator  c  for  the  transformed  system  (8.85),  leading  to  a 
decision  statistic  Z[n\  =  cTr[n\.  Show  that  we  can  write  the  decision  statistic  Z[n]  =  cTr[n]  (i.e., 
as  the  output  of  a  linear  correlator  operating  on  the  original  received  vector),  where 

c  =  A1  c  (8.88) 

(b)  Suppose  that  we  can  find  A  such  that  the  noise  in  the  transformed  system  is  white: 

ACwAt  =  I  (8.89) 

Show  that 

C"1  =  ArA  (8.90) 

(c)  Show  that  the  optimal  ZF  correlator  c  for  the  transformed  system  is  given  by 

cZF  =  u(urU)  1  e  (8.91) 

(d)  Show  that  the  optimal  ZF  correlator  c zf  in  the  original  system  is  given  by  (8.30). 

Hint:  Use  (8.86),  (8.88)  and  (8.90). 

(e)  While  we  have  used  whitening  only  as  an  intermediate  step  to  deriving  the  formula  (8.30)  for 
the  ZF  correlator  in  the  original  system,  we  note  for  completeness  that  a  whitening  matrix  A 
satisfying  (8.89)  is  guaranteed  to  exist  for  any  positive  definite  Cw,  and  spell  out  two  possible 
choices  for  A.  For  example,  we  can  take  A  =  B_1,  where  B  is  the  square  root  of  Cw,  which 
is  a  symmetric  matrix  satisfying  Cw  =  B2.  Another,  often  more  numerically  stable,  choice  is 
A  =  L-1,  where  L  is  a  lower  triangular  matrix  obtained  by  the  Cholesky  decomposition  of  Cw. 
which  satisfies  Cw  =  LL2 .  Matlab  functions  implementing  these  are  given  below: 


7„square  root  of  Cw 

B=sqrtm(Cw)  ;  °/„syrametric  matrix 

°/„Cholesky  decomposition  of  Cw 

L=chol (Cw, ’ lower ’ ) ;  %lower  triangular  matrix 


Throughout  the  preceding  problem,  replacing  transpose  by  conjugate  transpose  gives  the  cor¬ 
responding  results  for  the  complex-valued  setting.  The  Matlab  code  segment  above  applies  for 
both  real-  and  complex-valued  noise. 
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MIMO 


Problem  8.10  (Effect  of  array  spacing)  Consider  a  regular  linear  array  with  N  elements 
and  inter-element  spacing  d. 

(a)  For  N  =  8,  plot  the  beam  pattern  for  a  beam  directed  at  30°  from  broadside  for  d  —  j. 

(b)  Repeat  (a)  for  d  —  2A. 

(c)  Comment  on  any  differences  that  you  notice  in  the  beamforming  patterns  in  (a)  and  (b). 

(d)  For  inter-element  spacings  of  a  A,  the  maximum  of  the  beam  pattern  is  not  unique  as  a  gets 
larger.  That  is,  the  beam  pattern  takes  its  maximum  value  not  just  in  the  desired  direction,  but 
also  in  a  few  other  directions.  These  other  maxima  are  called  grating  lobes.  What  is  the  value  of 
a  beyond  which  you  expect  to  see  grating  lobes? 


Problem  8.11  (SDMA)  The  base  station  in  a  cellular  network  is  equipped  with  a  linear  array 
with  16  elements  uniformly  spaced  at  A/3.  Consider  two  mobiles,  Mobile  A  is  at  angle  20°  from 
broadside  and  Mobile  B  is  at  angle  —30°  from  broadside. 

wishes  to  simultaneously  send  different  data  streams  to  two  different  mobiles.  Assume  that  it 
has  a  linear  array  with  16  elements  uniformly  spaced  at  A/3.  Mobile  A  is  at  angle  20°  from 
broadside  and  Mobile  B  is  at  angle  —30°  from  broadside. 

(a)  Compute  the  array  responses  corresponding  to  each  mobile,  and  plot  the  beamforming  pat¬ 
terns  if  the  base  station  were  only  communicating  with  one  mobile  at  a  time. 

(b)  Now,  suppose  that  the  base  station  employs  SDMA  using  zero-forcing  interference  suppres¬ 
sion  to  send  to  both  mobiles  simultaneously.  Plot  the  beam  patterns  used  to  send  to  Mobile  A 
and  Mobile  B,  respectively. 

(c)  What  is  the  noise  enhancement  in  (b)  relative  to  (a). 

(d)  Repeat  (a)-(c)  when  Mobile  B  is  at  angle  10°  from  broadside  (i.e. ,  closer  to  Mobile  A  in 
angular  spacing).  You  should  notice  a  significant  increase  in  noise  enhancement. 

(e)  Try  playing  around  with  different  values  of  angular  spacing  between  mobiles  to  determine 
when  the  base  station  should  attempt  to  use  SDMA  (e.g.,  what  is  the  minimum  angular  spacing 
at  which  the  noise  enhancement  is,  say,  less  than  3  dB). 


Problem  8.12  (Outage  rates  with  receive  diversity)  Consider  a  1  x  N  MIMO  system  with 
receive  diversity.  The  gain  relative  to  a  SISO  system  is  given  by  (8.64): 

G  =  \hi\2  +  ...  +  \hN\2 

For  our  rich  scattering  model,  hi  ~  CN( 0, 1)  are  i.i.d.,  hence  hi}  '  are  i.i.d.  exponential  random 
variables,  each  with  mean  one.  We  state  without  proof  that  the  sum  of  N  such  random  variables 
is  a  Gamma  random  variable  with  PDF  and  CDF  given  by 

N-i 

Pg( g)  =  7^  _  %> o  (8-92) 

°o  JV-1 

Fa(g)  =  P{G<g]  =  e-<'Y,Ji  =  1  “  e“' E  §p  ^0  (8.93) 

k=N  k= 0 

(a)  Use  the  preceding  results  to  compute  the  probability  of  outage  (log  scale)  for  A-fold  receive 
diversity  versus  link  margin  (dB)  relative  to  the  SISO  benchmark  for  N  =  1,2,4.  That  is, 
reproduce  the  results  displayed  in  Figure  8.20. 

(b)  Optional  It  may  be  an  interesting  exercise  to  use  simulations  to  compute  the  empirical  CDF 
of  G,  and  to  check  that  you  get  the  same  outage  rate  curves  as  those  in  (a). 
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Problem  8.13  (Alamouti  scheme  with  multiple  receive  antennas)  Consider  a  2  x  A 
MIMO  system  where  the  transmitter  employs  Alamouti  space-time  coding  as  in  (8.67).  Let 
H  =  (hih2)  denote  the  A  x  2  channel  matrix,  with  (A  x  1)  columns  Ip  and  h2 

(a)  Show  that  the  optimal  receiver  is  given  by  (8.70),  where  ui  =  -^(hi,h2)T  and  u2  = 

^(h2,— h^)T,  and 


y  = 


(  y[i] 
V  y*[2] 


(b)  Show  that  the  SNR  gain  relative  to  our  unfaded  SISO  system  with  the  same  transmit  power 
and  constant  channel  gain  of  unity  is  given  by 


G 


1  N  2 

2  EE  k)\2 


Comparing  with  the  receive  diversity  gain  (8.64)  in  a  1  x  A  system,  answer  the  following 
True/False  questions  (give  reasons  for  your  answers). 

(c)  True  or  False:  A  2  x  2  MIMO  system  with  Alamouti  space-time  coding  is  3  dB  better  than 
a  1  x  2  MIMO  system  with  receive  diversity. 

(d)  True  or  False:  A  2  x  2  MIMO  system  with  Alamouti  space-time  coding  is  3  dB  worse  than 
a  1  x  4  MIMO  system  with  receive  diversity. 

(e)  Use  the  approach  in  Problem  8.12  to  compute  and  plot  the  outage  probability  (log  scale)  ver¬ 
sus  link  margin  (dB)  relative  to  the  unfaded  SISO  system  for  a  2  x  A  MIMO  system,  A  =  1,  2, 4. 
You  should  get  a  plot  that  follows  those  in  Figure  8.22. 


Problem  8.14  (Outage  rates  for  spatial  multiplexing  with  ZF  reception)  Consider  two¬ 
fold  spatial  multiplexing  in  a  2  x  A  MIMO  system  with  A  x  2  channel  matrix  H.  Define  the 
2x2  matrix  R  =  H^H. 

(a)  Referring  to  the  spatial  multiplexing  model  (8.73),  how  do  the  entries  of  R  relate  to  the 
signal  vectors  Ui  and  u2? 

(b)  Show  that  the  energy  of  the  projection  of  U!  orthogonal  to  the  subspace  spanned  by  u2  is 
given  by 


Ei  =  R(l,  1) 


|R(1.2)I2 

R(2,2) 


where  R(i,j)  denotes  the  (z,  j)th  entry  of  R,  i,j  =  1,2. 

(c)  If  we  fix  the  transmit  power  to  that  of  the  SISO  benchmark  (splitting  it  equally  between  the 
two  data  streams),  show  that  the  gain  seen  by  the  first  data  stream  is  given  by 


Gi 


Ei 

2 


Ur(m) 


|R(1,2)|2\ 

R(2,  2)  ) 


Similarly,  the  gain  seen  by  the  second  data  stream  is  given  by 


G  2 


-  (  R.(2, 2) 


iR(i,2)in 
R(i.i)  ) 


Note  that,  under  our  rich  scattering  model,  G\  and  G2  are  identically  distributed  random  vari¬ 
ables. 

(c)  Use  computer  simulations  with  the  rich  scattering  model  to  plot  the  outage  rate  versus  link 
margin  for  2  x  2  and  2x4  MIMO  with  two-fold  spatial  multiplexing  and  ZF  reception.  You 
should  get  a  plot  similar  to  Figure  8.24.  Discuss  how  the  performance  compares  with  that  of  the 
Alamouti  scheme. 
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Problem  8.15  (Outage  rates  for  spatial  multiplexing  with  ZF  reception)  (a)  Argue 
that  2  x  N  Alamouti  space-time  coding  is  exactly  3  dB  worse  than  1  x  2N  receive  diversity. 

(b)  For  2  x  N  spatial  multiplexing  with  ZF  reception,  approximate  its  performance  as  x  dB 
worse  than  a  1  x  N'  receive  diversity.  (Note  that  spatial  multiplexing  has  twice  the  bandwidth 
efficiency  as  receive  diversity,  but  it  loses  3  dB  of  power  up  front  due  to  splitting  it  between  the 
two  data  streams.) 

Answer:  Approximately  3  dB  worse  than  1  x  (N  —  1)  receive  diversity.  That  is,  if  the  gain  relative 
to  the  SISO  benchmark  for  a  1  x  N  receive  diversity  system  is  denoted  as  Grx_div{ N),  then  the 
gain  for  a  2  x  N  spatial  multiplexed  system  is  Gsmux(N )  ~  \Grx_div{N  —  l)/2.  Thus,  the  CDF, 
and  hence  outage  rate,  of  the  spatially  multiplexed  system  is  approximated  as 

P[GSmux( N)  <  x]  P[Grx_div(N  -  1)  <  2x\ 

(c)  Use  the  results  in  (b),  and  the  analytical  framework  in  Problem  8.12,  to  obtain  an  analytical 
approximation  for  the  simulation  results  in  Problem  8.14(c). 


Software  Lab  8.1:  Introduction  to  Equalization  in  Singlecarrier  Sys¬ 
tems 

Lab  Objectives:  To  understand  the  need  for  equalization  in  communication  systems,  and  to 
implement  linear  MMSE  equalizers  adaptively. 

Reading:  Sections  8. 1-8.2;  Chapter  4  (linear  modulation);  Section  5.6  (Gaussian  random  vari¬ 
ables  and  the  Q  function).  This  lab  can  be  completed  without  systematic  coverage  of  Chapter 
6;  we  state  and  use  probability  of  error  expressions  from  Chapter  6,  but  knowing  how  they  are 
derived  is  not  required  for  the  lab. 


Laboratory  Assignment 

0)  Use  as  your  transmit  and  receive  filters  the  SRRC  pulse  employed  in  Software  Labs  4.1  and 
6.1.  Putting  the  code  for  realizing  these  together  with  the  code  fragments  developed  in  this 
chapter  provides  the  code  required  for  this  lab.  As  in  Software  Labs  4.1  and  6.1,  the  transmit, 
channel,  and  receive  filters  are  implemented  at  rate  4/T.  For  simplicity,  we  consider  BPSK 
signaling  throughout  this  lab,  and  consider  only  real-valued  signals.  Generate  nsymbols  = 
ntraining  +  npayload  (numbers  to  be  specified  later)  ±1  BPSK  symbols  as  in  Lab  6.1,  and  pass 
them  through  the  transmit,  channel,  and  receive  filters  to  get  noiseless  received  samples  at  rate 
4/T. 

1)  Let  us  start  with  a  trivial  channel  filter  as  before.  Set  nsymbols  =  200.  The  number  of  rate 
4/T  samples  at  the  output  of  the  receive  filter  is  therefore  800,  plus  tails  at  either  end  because 
the  length  of  the  effective  pulse  modulating  each  symbol  extends  over  multiple  symbol  intervals. 
Plot  an  eye  diagram  (e.g.,  using  code  fragment  8.1.3)  using,  say,  400  samples  in  the  middle.  You 
should  get  an  eye  diagram  that  looks  like  Figure  8.26:  the  cascade  of  the  transmit  and  receive 
filter  is  approximately  Nyquist,  and  the  eye  is  open,  so  that  we  can  find  a  sampling  time  such 
that  we  can  distinguish  between  +1  and  -1  well,  despite  the  influence  of  neighboring  symbols. 

2)  Now  introduce  a  non-trivial  channel  filter.  In  particular,  consider  a  channel  filter  specified  (at 
rate  4/T)  using  the  following  matlab  command: 

channeLfilter  =  [-0.7,  -0.3,  0.3,  0.5, 1,  0.9,  0.8,  -0.7,  -0.8,  0.7,  0.8,  0.6,  0.3]'; 

Generate  an  eye  diagram  again.  You  should  get  something  that  looks  like  Figure  8.27.  Notice 
now  that  there  is  no  sampling  time  at  which  you  can  clearly  make  out  the  difference  between 
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Figure  8.26:  Eye  diagram  for  a  non- dispersive  channel.  The  eye  is  open. 


Figure  8.27:  Eye  diagram  for  a  dispersive  channel.  The  eye  is  closed. 
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+1  and  -1  symbols.  The  eye  is  now  said  to  be  closed  due  to  ISI,  so  that  we  cannot  make  symbol 
decisions  just  by  passing  appropriately  timed  received  samples  through  a  thresholding  device. 

3)  We  are  now  going  to  evaluate  probability  of  error  without  and  with  equalization.  First,  let 
us  generate  the  noisy  output  of  the  receive  filter.  We  need  to  generate  nsymbols  =  ntraining  + 
npayload  (numbers  to  be  specified  later)  ±1  BPSK  symbols  as  in  Software  Lab  6.1,  and  pass  them 
through  the  transmit  filter,  the  dispersive  channel,  and  the  receive  filter  to  get  noiseless  received 
samples  at  rate  4/T.  Since  we  are  signaling  along  the  real  axis  only,  at  the  input  to  the  receive 
filter,  add  iid  iV(0,cr2)  real- valued  noise  samples  (as  in  Lab  6.1,  choose  a2  =  ^  corresponding 
to  a  specified  value  of  j^).  Pass  these  (rate  4/T)  noise  samples  through  the  receive  filter,  and 
add  the  result  to  the  signal  contribution  at  the  receive  filter  output. 

4)  Performance  without  equalization:  Let  { r &}  denote  the  output  of  the  receive  filter,  and 
let  Z[n\  =  ra+4(n-i)j  n  —  1,2, ...,  nsymbols  denote  the  best  symbol  rate  decision  statistics  you 
can  obtain  by  subsampling  at  rate  1/T  the  receive  filter  output.  As  in  the  solutions  to  earlier 
labs,  choose  the  decision  delay  d  equal  to  the  location  of  the  maximum  of  the  overall  response 
(which  now  includes  the  channel)  to  a  single  symbol.  For  nsymbols  =  10100,  compute  the  error 
probability  of  the  decision  rule  b[n]  =  sign(Z[n])  as  a  function  of  E^/Nq,  where  the  latter  ranges 
from  5  to  20  dB.  Compare  with  the  ideal  error  probability  curve  for  BPSK  signaling  for  the  same 
range  of  Eb/N0.  This  establishes  that  a  simple  one-sample  per  symbol  decision  rule  does  not 
work  well  for  non-ideal  channels,  and  motivates  the  equalization  schemes  discussed  below. 

Linear  equalization:  We  now  consider  linear  equalization,  where  the  decision  for  symbol  bn  is 
based  on  linear  processing  of  a  vector  of  samples  r[n]  of  length  L  =  2 M  +  1,  where  the  entries  of 
r[n]  are  samples  spaced  by  T/q,  with  the  center  sample  being  the  same  as  the  decision  statistic 
in  part  3:  q  —  1  corresponds  to  symbol-spaced  sampling,  and  q  >  1  corresponds  to  fractionally 
spaced  sampling.  We  consider  two  cases:  q  —  1  and  q  =  2. 

(^,fe+4(n—  l)+d—  (4/g)M j  fk-\-A{n—  l)+d—  (4/q)(M—  1)  i  ■  ■  •  i 
^’fc+4(n— l)+d>  ^fc+4(n— l)+d+(4/g) j  ^’fe+4(n—  l)+d+(4/g)Af  ) 

The  decision  rule  we  use  is 

b  =  sign(cTr[n])  (8.94) 

where  c  is  a  correlator  whose  choice  is  to  be  specified.  Note  that  the  decision  rule  in  part  3 
corresponds  to  the  choice  c  =  (0, ..,  0, 1,  0, ...,  0)T,  since  it  uses  only  the  center  sample. 

The  vector  of  samples  r[n]  contains  contributions  from  both  the  desired  symbol  bn  and  from  ISI 
due  to  bn± i,  bn±2,  etc.  We  implement  the  linear  minimum  mean  squared  error  (MMSE)  equalizer 
using  a  least  squares  adaptive  implementation,  as  discussed  in  Section  8.2.1. 

5)  For  the  least  squares  implementation,  assume  that  the  first  ntraining  symbols  are  known 
training  symbols,  b\,  ■■■,bntraining.  Define  the  L  x  L  matrix 


R 


1 

ntraining 


ntraining 

r[n]rr[n] 

n=  1 


and  the  Lx  1  vector 


P  = 


1 

ntraining 


ntraining 

yy  6[n]r[n] 

n=  1 


The  MMSE  correlator  is  now  approximated  as 


CmMSE  —  (R)  1p 


(8.95) 


6)  Now,  the  correlator  obtained  via  (8.8)  is  used  to  make  decisions,  using  the  decision  rule  (8.94), 
on  the  unknown  symbols  n  =  ntraining  +  1, ...,  nsymbols. 
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7)  Fix  ntraining  =  100  and  npayload  =  10000.  For  L  =  3,5,  7,  9,  and  q  =  1,2,  implement  linear 
MMSE  equalizers,  and  plot  their  error  probabilities  (for  the  payload  symbols)  as  a  function  of 
Eb/N0,  in  the  range  5  to  20  dB.  to  the  unequalized  error  probability  and  the  ideal  error  probability 
found  in  part  3. 

Hint:  An  efficient  way  to  generate  the  statistics  cTr[n]  is  to  pass  an  appropriate  rate  2/T 
subsequence  of  the  receive  filter  output  through  a  filter  whose  impulse  response  is  the  time 
reverse  of  c,  and  to  then  appropriately  subsamplc  at  rate  1/T  the  output  of  the  equalizing  filter. 
This  is  much  faster  than  correlating  c  with  r[n]  for  each  n. 

8)  Comment  on  the  performance  of  symbol-spaced  versus  fractionally  spaced  equalization.  Com¬ 
ment  on  the  effect  of  equalizer  length  of  performance.  What  is  the  effect  of  increasing  or  de¬ 
creasing  the  training  period? 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encountered. 


Software  Lab  8.2:  Simplified  Simulation  Model  for  a  OFDM  link 

Lab  Objectives:  To  develop  a  hands-on  understanding  of  basic  OFDM  transmission  and  re¬ 
ception. 

Reading:  Section  8.3  (OFDM);  Chapter  4  (linear  modulation). 

Laboratory  Assignment 

We  would  like  to  leverage  the  code  from  Software  Lab  8.1  as  much  as  possible,  so  we  set  the  DAC 
filter  to  be  the  transmit  filter  in  that  lab,  and  the  receive  filter  to  its  matched  filter  as  before. 
The  main  difference  is  that  the  time  domain  samples  sent  through  the  DAC  filter  are  obtained 
by  taking  the  inverse  FFT  of  the  frequency  domain  symbols,  and  inserting  a  cyclic  prefix.  We 
fix  the  constellation  as  Gray  coded  QPSK. 

Step  1  (Exploring  time  and  frequency  domain  relationships  in  OFDM):  Let  us  first 
discuss  the  structure  of  a  single  “OFDM  symbol,”  which  carries  N  complex-valued  symbols  in 
the  frequency  domain.  Here  N  is  the  number  of  subcarriers,  chosen  to  be  a  power  of  2.  Set  L  to 
be  length  of  the  cyclic  prefix.  Set  N  =  256,  L  —  20  for  these  initial  explorations,  but  keep  the 
parameters  programmable  for  later  use. 

la)  Generate  N  Gray  coded  QPSK  symbols  B  =  {B[k\,  k  —  1, ..,  N}.  (You  can  use  the  function 
qpskmap  developed  in  Software  Lab  6.1  for  this  purpose.)  Take  the  inverse  FFT  to  obtain  time 
domain  samples  b  =  {b[n\,n  =  1, ...,  N}. 

lb)  Append  the  last  L  time  domain  samples  to  the  beginning,  to  get  a  length  N  +  L  sequence 
of  time  domain  samples  b'  =  {b'[n\,n  =  1, ...,  N  +  L}.  That  is,  b'[  1]  =  b[N  —  L  +  1],  ...,b'[L\  = 
6[A],  b'[L  +  1]  =  b[  1], ...,  b'[N  +  L]  =  b[N}. 

lc)  Take  the  first  N  symbols  of  b',  say  iq  =  {6/ [1] , ...,  6' [IV]}.  Show  that  the  FFT  output  (say 
Ri  =  {R\  [/c] })  is  related  to  the  original  frequency  domain  symbols  B  through  a  frequency  domain 
channel  H  as  follows:  R\[k]  =  H[k\B[k\.  Find  and  plot  the  amplitude  \H[k}\  and  phase  arg(H[k ]) 
versus  k. 

ld)  Repeat  lc)  for  the  time  domain  samples  {b' [3], ...,  b'[N  +  3]}  (i.e.,  skip  the  first  two  samples 
of  bk  How  are  the  frequency  domain  channels  in  lc)  and  Id)  related?  (What  we  are  doing  here 
is  exploring  what  cyclic  shifts  in  the  time  domain  do  in  the  frequency  domain.) 

Step  2  (generating  multiple  OFDM  symbols):  Now,  we  generate  K  frames,  each  carrying 
N  Gray  coded  QPSK  symbols.  Set  N  =  256,  L  =  20,  K  =  5  for  numerical  results  and  plots  in 
this  step. 

2a)  For  each  frame,  generate  time  domain  samples  and  add  a  cyclic  prefix,  as  in  Steps  la)  and 
lb).  Then,  append  the  time  domain  samples  for  successive  frames  together.  We  now  have  a 
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stream  of  K(N  +  L)  time  domain  samples,  analogous  to  the  time  domain  symbols  sent  in  Lab  3. 
2c)  Pass  the  time  domain  symbols  through  the  same  transmit  filter  (this  is  the  DAC  in  Figure 
8.9)  as  in  Software  Labs  4.1,  6.1  and  8.1,  again  oversampling  by  a  factor  of  4.  (That  is,  if  the 
time  domain  samples  are  at  rate  1/TS,  the  filter  is  implemented  as  rate  4/Ts.  This  gives  us  a 
rate  4/Ts  transmitted  signal. 

2d)  Compute  the  peak  to  average  power  ratio  (PAPR)  in  dB  for  this  transmitted  signal  (OFDM 
is  notorious  for  having  a  large  PAPR).  This  is  done  by  taking  the  ratio  of  the  maximum  to 
average  value  of  the  magnitude  squared  of  the  time  domain  samples. 

2e)  Note  that  the  original  QPSK  symbols  in  the  frequency  domain  have  a  PAPR  of  one,  but  the 
time  domain  samples  are  generated  by  mixing  these  together.  The  time  domain  samples  could 
be  expected,  therefore,  to  have  a  Gaussian  distribution,  invoking  the  central  limit  theorem.  Plot 
a  histogram  of  the  I  and  Q  components  from  the  time  domain  samples.  Do  they  look  Gaussian? 
2f)  As  in  Lab  6.1,  assume  an  ideal  channel  filter  and  pass  the  transmitted  signal  through  a  receive 
filter  matched  to  the  transmit  filter.  This  gives  a  rate  4 jTs  noiseless  received  signal. 

2g)  Subsample  the  received  signal  at  rate  1/TS,  starting  with  a  delay  of  d  samples  (play  around 
and  see  what  choice  of  d  works  well-perhaps  based  on  the  peak  of  the  cascade  of  the  transmit, 
channel  and  receive  filter).  The  first  N  samples  corresponding  to  the  first  frame.  Take  the  FFT 
of  these  N  samples  to  get  {i?i  [&;]}.  Now,  estimate  the  frequency  domain  channel  coefficients 
{H[k}}  by  using  the  known  transmitted  symbols  Bi[k\  in  the  first  frame  as  training.  That  is, 

H[k]  =  Ri[k\/Bi[k\ 

Plot  the  magnitude  and  phase  of  the  channel  estimates  and  comment  on  how  it  compares  to 
what  you  saw  in  lc)  and  Id). 

2h)  Now,  use  the  channel  estimate  from  2g)  to  demodulate  the  succeeding  frames.  If  frame  m 
uses  time  domain  samples  over  a  window  [a,  b\,  then  frame  m  +  1  uses  time  domain  samples  over 
a  window  [a  +  (N  +  L)Ts,b  +  (N  +  L)TS).  Denoting  the  FFT  of  the  time  domain  samples  for 
frame  m  as  Rm  [k] ,  the  decision  statistics  for  the  frequency  domain  symbols  for  the  mth  frame 
are  given  by 

Bm[k]  =  H*[k\Rm[k\ 

You  can  now  decode  the  bits  and  check  that  you  get  a  BER  of  zero  (there  is  no  noise  so  far). 
Also,  display  scatter  plots  of  the  decision  statistics  to  see  that  you  are  indeed  seeing  a  QPSK 
constellation  after  compensating  for  the  channel. 

Step  3  (Channel  compensation):  We  now  introduce  a  nontrivial  channel  (still  no  noise). 
Increase  the  cyclic  prefix  length  if  needed  (it  should  be  long  enough  to  cover  the  cascade  of  the 
transmit,  channel  and  receive  Liters.  But  remember  that  the  cyclic  prefix  is  at  rate  1/TS,  whereas 
the  Liter  cascade  is  at  rate  4/Ts. 

3a)  Repeat  Step  2f,  but  now  with  a  nontrivial  channel  Liter  modeled  at  rate  4/Ts.  Use  the 
channels  you  have  tried  out  in  Lab  3  (still  no  noise).  For  example: 

channeLLlter  =  [-0.7,  -0.3,  0.3,  0.5, 1,  0.9,  0.8,  -0.7,  -0.8,  0.7,  0.8,  0.6,  0.3]'; 

3b)  Repeat  Step  2g.  Comment  on  how  the  magnitude  and  phase  of  the  frequency  domain  channel 
differs  from  what  you  saw  in  2g,  lc  and  Id. 

3c)  Repeat  Step  2h.  Check  that  you  get  a  BER  of  zero,  and  that  your  decision  statistics  give 
nice  QPSK  scatter  plots. 

3d)  Check  that  everything  still  works  out  as  you  vary  the  number  of  subcarriers  N  (e.g.,  N  = 
512, 1024,  2048),  the  cyclic  preLx  length  L  and  the  number  of  frames  K. 

Step  4  (Effect  of  noise)  :  Now,  add  noise  as  in  Software  Labs  6.1  and  8.1.  SpeciLcally,  at  the 
input  to  the  receive  Liter,  add  independent  and  identically  distributed  (iid)  complex  Gaussian 
noise,  such  that  the  real  and  imaginary  part  of  each  sample  are  iid  N( 0,  a2)  (we  choose  a2  =  ^ 
corresponding  to  a  specihed  value  of  ^).  Let  us  Lx  N  =  1024  for  concreteness,  and  set  the  cyclic 
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prefix  to  just  a  little  longer  than  the  minimum  required  for  the  channel  you  are  considering.  Set 
the  number  of  frames  to  K  =  10.  Try  a  couple  of  values  of  E^/Nq  of  5  dB  and  8  dB. 

4a)  While  you  can  estimate  E &  analytically,  estimate  it  by  taking  the  energy  of  the  transmitted 
signal  in  3a,  and  dividing  it  by  the  number  of  bits  in  the  payload  (i.e.,  excluding  the  first  frame). 
Use  this  to  set  the  value  of  Nq  for  generating  the  noise  samples. 

4b)  Pass  the  (rate  4/Ts)  noise  samples  through  the  receive  filter,  and  add  the  result  to  the  output 
of  part  3a. 

4c)  Consider  first  a  noiseless  channel  estimate,  in  which  you  carry  out  Step  3b  (estimating  the 
channel  based  on  frame  1)  before  you  add  noise  to  the  output  of  3a.  Now  add  the  noise  and 
carry  out  Step  3c  (demodulating  the  other  frames).  Estimate  the  BER  and  compare  with  the 
analytical  value  for  ideal  QPSK.  Show  the  scatter  plots  of  the  decision  statistics. 

4d)  Repeat  4c,  except  that  you  now  estimate  the  channel  based  on  frame  1  after  adding  noise. 
Discuss  how  the  BER  degrades.  Compare  the  channel  estimates  from  parts  4c  and  4d  on  the 
same  plot. 

Note:  You  may  notice  a  significant  BER  degradation,  but  that  is  because  the  channel  estimation 
technique  is  naive  (the  channel  coefficients  for  neighboring  subcarriers  are  highly  correlated,  but 
our  estimate  is  not  exploiting  this  property).  Exploring  better  channel  estimation  techniques  is 
beyond  the  scope  of  this  lab,  but  you  are  encouraged  to  browse  the  literature  on  OFDM  channel 
estimation  to  dig  deeper. 

Step  5  (Consolidation):  Once  you  are  happy  with  your  code,  plot  the  BER  (log  scale)  as  a 
function  of  Eb/N0  (dB)  for  the  channel  in  Lab  3.  Plot  three  curves:  ideal  QPSK,  OFDM  with 
noiseless  channel  estimation,  OFDM  with  noisy  channel  estimation.  Comment  on  the  relation 
between  the  curves. 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encountered. 


Software  Lab  8.3:  MIMO  signal  processing 

Reading:  Section  8.4 

Lab  Objectives:  To  gain  hands-on  exposure  to  basic  MIMO  signal  processing  at  the  transmitter 
and  receiver. 


Laboratory  Assignment 


Background:  Consider  the  rich  scattering  model  for  a  single  subcarrier  in  a  MIMO-OFDM 
system  with  M  transmit  and  N  receive  antennas.  The  N  x  M  channel  matrix  H  is  modeled  as 
having  i.i.d.  CN(0, 1)  entries. 


Code  Fragment  8.7.1  (MIMO  matrix  with  i.i.d.  complex  Gaussian  entries) 

°/„M,N  specified  earlier 

7„MIM0  matrix  with  iid  CN(0,1)  entries 

H= (randn (N , M) + j  *randn(N , M) ) /sqrt (2) ; 


Let  T  denote  the  number  of  time  domain  samples  for  our  system.  Let  ay  [t]  denote  the  sam¬ 
ple  transmitted  from  transmit  antenna  i  at  time  t,  where  1  <  i  <  M  and  1  <  t  <  T.  Let 
x[t]  =  (xi [f], ...,  xjv/[t])T  denote  the  M  x  1  vector  of  samples  transmitted  at  time  t,  and  let 
X  =  (x[l],  ...,x[T])  denote  the  M  x  T  matrix  containing  all  the  transmitted  samples.  Our  con¬ 
vention  is  to  normalize  the  net  transmit  power  to  one,  so  that  |xj[t]|2  =  jj.  For  a  single  input 
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single  output  (SISO)  system,  this  would  lead  to  an  average  received  SNR  of  SNR  =  since  the 
magnitude  squared  of  the  channel  gain  is  normalized  to  one,  and  the  noise  per  receive  antenna 
is  modeled  as  CiV(0,  2a2),  and  we  vary  this  hypothetical  SISO  system  SNR  when  evaluating 
performance. 

The  N  x  T  received  matrix  Y,  with  y j[t],  1  <  j  <  N  denoting  the  spatial  vector  of  received 
samples  at  time  t,  is  then  modeled  as 


Y  =  HX  +  N 

where  N  is  an  N  x  T  matrix  with  i.i.d.  CN(0,2a2)  entries.  This  model  is  implemented  in  the 
following  code  fragment. 

Code  Fragment  8.7.2  (Received  signal  in  MIMO  system) 

'/.snrbardb  specified  earlier 
'/.express  snrbar  in  linear  scale 
snrbar  =  10“ (snrbardb/10) ; 

'/.find  noise  variance,  assuming  TX  power  =  1 
% (snrbar  =  1/ (2*sigma“2) ) 
sigma  =  sqrt (l/(2*snr) ) ; 

°/„x  =  MxT  vector  of  symbols,  already  specified  earlier 
°/„ (normalized  to  unit  power  per  time) 

'/.'/.RECEIVED  SIGNAL  MODEL:  N  x  T  matrix 
y=H*x  +  sigma*randn(N,T)+j *sigma*randn(N ,T) ; 

In  order  to  use  the  preceding  generic  code  fragments  for  a  particular  MIMO  scheme,  we  must 
(a)  map  the  transmitted  symbols  into  the  matrix  X  of  transmitted  samples,  and  (b)  process  the 
matrix  Y  of  received  samples  appropriately. 

Alamouti  space-time  code:  We  first  consider  the  Alamouti  space-time  code  for  a  2  x  1  MIMO 
system.  The  transmitted  samples  can  be  generated  using  the  following  code  fragment. 

Code  Fragment  8.7.3  (Transmitted  samples  for  Alamouti  space-time  code) 

'/.assume  number  of  time  samples  T  has  been  specified 
%QPSK  symbols  normalized  to  unit  power  per  symbol 

symbols=  (sign(rand(l ,T)  -  0 . 5)+j*sign(rand(l ,T)  -  0.5))/sqrt(2) ; 

'/.Alamouti  space-time  code  mapping 
X=zeros (2 ,T) ;  °/.M=2 

X(1,1:2:T)  =  symbols (1 : 2 : T) ;  '/.odd  samples  from  antenna  1 
X(2,1:2:T)  =  symbols (2 : 2 : T) ; '/.odd  samples  from  antenna  2 
X(1,2:2:T)  =  -conj  (symbols (2 : 2 : T) )  ; '/.even  samples  from  antenna  1 
X(2,2:2:T)=  conj  (symbols (1 : 2 : T) )  ;  "/.even  samples  from  antenna  2 


Step  1:  Consider  a  2  x  1  MIMO  system.  Setting  M  —  2,  N  —  1  and  SNR  at  10  dB,  put  code 
fragments  8.7.1,  8.7.3,  and  8.7.2  together  to  model  the  transmitted  and  received  matrices  X  and 
Y.  Setting  T  =  100,  do  a  scatter  plot  of  the  real  and  imaginary  parts  of  the  received  samples. 
The  received  samples  should  be  smeared  out  over  the  complex  plane,  since  the  signals  from  the 
two  transmit  antennas  interfere  with  each  other  at  the  receive  antenna. 

Step  2:  Compute  the  decision  statistics  (8.70)  based  on  the  received  matrix  Y.  You  may  use 
the  following  code  fragment,  but  you  must  explain  what  it  is  doing.  Do  a  scatter  plot  of  the 
decision  statistics.  You  should  recover  the  noisy  QPSK  constellation. 
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Code  Fragment  8.7.4  (Receiver  processing  for  Alamouti  space-time  code  for  a  2  x  1 
MIMO  system) 

Ytilde  =  zeros(2,T/2) ;  "/assume  T  even 
Ytilde (1 , :)  =  Y(1,1:2:T) ; 

Ytilde (2,:)  =  conj (Y(1,2:2:T)) ; 

°/oUl=  [H (1,1)  ;  conj  CH (1,2))]  ;  u2  =  [H ( 1 ,2)  ;  -conj  (H(l ,  1) )]  ; 
ul=  [H (1,1) ; conj (H(l , 2) )] ;  u2  =  [H ( 1 , 2) ; -conj (H(l , 1) ) ] ; 

Z(1 : 2 : T)  =  ul’*Ytilde; 

Z(2 : 2 :T)  =  u2’*Ytilde; 

Step  3:  Repeat  Steps  1  and  2  for  a  few  different  realizations  of  the  channel  matrix.  The  quality 
of  the  scatter  plot  in  Step  2  should  depend  on  the  G  =  _ 

Step  4  :  Now,  suppose  that  we  use  the  Alamouti  space-time  code  for  a  2  x  N  MIMO  system, 
where  N  may  be  larger  than  one.  Show  that  only  the  receiver  processing  code  fragment  8.7.4 
needs  to  be  modified  (other  than  changing  the  value  of  N  in  the  other  code  fragments),  with 
Y  having  dimension  2 N  x  4  and  u1;  u2  each  having  dimension  2 N  x  1.  Implement  these 
modifications  and  do  a  scatter  plot  of  the  decision  statistics  for  N  =  2  and  N  =  4,  fixing  the 
equivalent  SISO  SNR  to  10  dB.  You  should  notice  a  qualitative  improvement  with  increasing  N 
as  you  run  several  channel  matrices,  although  the  plots  depend  on  the  channel  realization. 

Hint:  See  Problem  8.13. 

We  now  consider  spatial  multiplexing  in  a  2  x  N  MIMO  system.  We  can  now  send  2 T  symbols 
over  T  time  intervals,  as  in  the  following  code  fragment. 

Code  Fragment  8.7.5  (Transmitted  samples  for  two-fold  spatial  multiplexing) 

"/oQPSK  symbols  normalized  to  unit  power 

symbols=  (sign(rand(l , 2*T)  -  0.5)+j*sign(rand(l,2*T)  -  0.5))/sqrt(2) ; 

x=zeros (M,T) ; 

"/normalize  samples  so  as  emit  unit  power  per  unit  time 
x(l,:)  =  symbols (1 : 2 : 2*T) /sqrt (2) ; 
x(2,:)  =  symbols (2 : 2 : 2*T) /sqrt (2) ; 


Step  5:  Setting  M  =  2,  N  =  4  and  SNR  at  10  dB,  put  code  fragments  8.7.1,  8.7.5,  and  8.7.2 
together  to  model  the  transmitted  and  received  matrices  X  and  Y.  Setting  T  =  100,  again  do 
a  scatter  plot  of  the  real  and  imaginary  parts  of  the  received  samples  for  each  received  antenna. 
As  before,  the  received  samples  should  be  smeared  out  over  the  complex  plane,  since  the  signals 
from  the  two  transmit  antennas  interfere  with  each  other  at  the  receive  antennas. 

Step  6  :  Now,  apply  a  ZF  correlator  as  in  (8.74)  to  Y  to  separate  the  two  data  streams.  Do 
scatter  plots  of  the  two  estimated  data  streams.  You  should  recover  noisy  QPSK  constellations. 

Step  7:  Fixing  SNR  at  10  dB  and  fixing  a  2  x  4  channel  matrix,  compare  the  scatter  plots 
of  the  decision  statistics  for  the  Alamouti  scheme  with  those  for  two-fold  spatial  multiplexing. 
Which  ones  appear  to  be  cleaner? 

Lab  Report:  Your  lab  report  should  document  the  results  of  the  preceding  steps  in  order. 
Describe  the  reasoning  you  used  and  the  difficulties  you  encountered. 
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Epilogue 


We  conclude  with  a  brief  discussion  of  research  and  development  frontiers  in  communication 
systems.  This  discussion  is  speculative  by  its  very  nature  (it  is  difficult  to  predict  progress 
in  science  and  technology)  and  is  significantly  biased  by  the  author’s  own  research  experience. 
There  is  no  attempt  to  be  comprehensive.  The  goal  is  to  highlight  a  few  of  the  exciting  challenges 
in  communication  systems  in  order  to  stimulate  the  reader  to  explore  further. 


The  Continuing  Wireless  Story 

The  growth  of  content  on  the  Internet  continues  unabated,  driven  by  applications  such  as  video- 
on-demand,  online  social  networks,  and  online  learning.  At  the  same  time,  there  have  been 
significant  advances  in  the  sophistication  of  mobile  devices  such  as  smart  phones  and  tablet 
computers,  which  greatly  enhance  the  quality  of  the  content  these  devices  can  support  (e.g., 
smart  phones  today  provide  high-quality  displays  for  video  on  demand).  As  a  result,  users 
increasingly  expect  that  Internet  content  is  ubiquitously  and  seamlessly  available  on  their  mobile 
device.  This  means  that,  even  after  the  runaway  growth  of  cellular  and  WiFi  starting  in  the 
1990s,  wireless  remains  the  big  technology  story.  Mobile  operators  today  face  the  daunting  task 
of  evolving  networks  originally  designed  to  support  voice  into  broadband  networks  supplying  data 
rates  of  the  order  of  10s  of  Mbps  or  more  to  their  users.  By  some  estimates,  it  requires  a  1000-fold 
increase  in  cellular  network  capacity  in  urban  areas!  On  the  other  hand,  since  charging  by  the 
byte  is  not  an  option,  this  growth  in  capacity  must  be  accomplished  in  an  extremely  cost-effective 
manner,  which  demands  significant  technological  breakthroughs. 

At  the  other  end  of  the  economic  spectrum,  cellular  connectivity  has  reached  the  remotest  corners 
of  this  planet,  with  even  basic  voice  and  text  messaging  transforming  lives  in  developing  nations 
by  providing  access  to  critical  information  (e.g.,  enabling  farmers  to  obtain  timely  information  on 
market  prices  and  weather).  The  availability  of  more  sophisticated  mobile  devices  implies  that 
ongoing  revolutionary  developments  in  online  education  and  healthcare  can  reach  underserved 
populations  everywhere,  as  long  as  there  is  adequate  connectivity  to  the  Internet.  The  lack  of 
such  connectivity  is  commonly  referred  to  as  the  digital  divide. 

Wireless  researchers  now  face  the  challenge  of  building  on  the  great  expectations  created  by  the 
success  of  the  technologies  they  have  created.  At  one  end,  how  can  we  scale  cellular  network 
capacity  by  several  orders  of  magnitude,  in  order  to  address  the  exponential  growth  in  demand 
for  wireless  data  created  by  smart  mobile  devices?  At  the  other  extreme,  how  do  we  close  the 
digital  divide,  ensuring  that  even  the  most  remote  regions  of  our  planet  gain  access  to  the  wealth 
of  information  available  online?  In  addition,  there  are  a  number  of  specialized  applications  of 
wireless  that  may  assume  significant  importance  as  time  evolves. 

We  summarize  some  key  concepts  driving  this  continuing  technology  story  in  the  following. 

Small  cells:  There  are  two  fundamental  approaches  to  scaling  up  data  rates:  increasing  spatial 
reuse  (i.e. ,  using  the  same  time-bandwidth  resources  at  locations  that  are  far  enough  apart), 
and  increasing  communication  bandwidth.  Decreasing  cell  sizes  from  macrocells  with  diameters 
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of  the  order  of  kilometers  to  picocells  with  diameters  of  the  order  of  100-200  meters  increases 
spatial  reuse,  and  hence  potentially  the  network  capacity,  by  two  orders  of  magnitude.  Picocel- 
lular  base  stations  may  be  opportunistically  deployed  on  lampposts  or  rooftops,  and  see  a  very 
different  propagation  and  interference  environment  from  macrocellular  base  stations  carefully 
placed  at  elevated  locations.  Interference  among  adjacent  picocells  becomes  a  major  bottleneck, 
as  does  the  problem  of  handing  off  rapidly  moving  users  as  they  cross  cell  boundaries  (indeed, 
cell  boundaries  are  difficult  to  even  define  in  picocellular  networks  due  to  the  complexity  of 
below-rooftop  propagation).  Thus,  it  is  important  to  rethink  the  design  philosophy  of  tightly 
controlled  deployment  and  operation  in  today’s  macrocellular  networks.  The  scaling  and  organic 
growth  of  picocellular  networks  is  expected  to  require  a  significantly  greater  measure  of  decentral¬ 
ized  self-organization,  including,  for  example,  auto-configuration  for  plug-and-play  deployment, 
decentralized  coordination  for  interference  and  mobility  management,  and  automatic  fault  de¬ 
tection  and  self-healing.  Another  critical  issue  with  small  cells  is  backhaul  (i.e. ,  connecting  each 
base  station  to  the  wired  Internet):  pulling  optical  fiber  to  every  lamppost  on  which  a  picocel¬ 
lular  base  station  is  deployed  may  not  be  feasible.  Finally,  we  can  go  to  even  smaller  cells  called 
femtocells,  with  base  stations  typically  deployed  indoors,  in  individual  homes  or  businesses,  and 
using  the  last  mile  broadband  technology  already  deployed  in  such  places  for  backhaul.  For  both 
picocells  and  femtocells,  it  is  important  to  devise  efficient  techniques  for  sharing  spectrum,  and 
managing  potential  interference,  with  the  macrocellular  network.  In  essence,  we  would  like  to 
be  able  to  opportunistically  deploy  base  stations  as  we  do  WiFi  access  points,  but  coordinate 
just  enough  to  avoid  the  tragedy  of  the  commons  resulting  from  the  purely  selfish  behavior  in 
unmanaged  WiFi  networks.  Of  course,  as  we  learn  more  about  how  to  scale  such  self- organized 
cellular  networks,  we  might  be  able  to  apply  some  of  the  ideas  to  promote  peaceful  coexistence 
in  densely  deployed  and  independently  operated  WiFi  networks  using  unlicensed  spectrum.  In 
short,  it  is  fair  to  say  that  there  is  a  clear  opportunity  and  dire  need  for  significant  innovations  in 
overall  design  approach  as  well  as  specific  technological  breakthroughs,  in  order  to  truly  attain 
the  potential  of  “small  cells.” 

Millimeter  wave  communication:  While  commercial  wireless  networks  deployed  today  employ 
bands  well  below  10  GHz,  there  is  significant  interest  in  exploring  higher  carrier  frequencies,  where 
there  are  vast  amounts  of  spectrum.  Of  particular  interest  are  millimeter  (mm)  wave  frequencies 
from  30-300  GHz,  corresponding  to  wavelengths  from  10  mm  down  to  1  mm.  Historically, 
RF  front  end  technology  for  these  bands  has  been  expensive  and  bulky,  hence  there  was  limited 
commercial  interest  in  using  them.  This  has  changed  in  recent  years,  with  the  growing  availability 
of  low-cost  silicon  radio  frequency  integrated  circuits  (RFICs)  in  these  bands.  The  particular  slice 
of  spectrum  that  has  received  the  most  attention  is  the  60  GHz  band  (from  57-64  GHz).  Most  of 
this  band  is  unlicensed  worldwide.  The  availability  of  7  GHz  of  unlicensed  spectrum  (vastly  more 
than  the  bandwidth  in  current  cellular  and  WiFi  systems)  opens  up  the  possibility  for  another 
revolution  in  wireless  communication,  with  links  operating  at  multiples  of  Gigabits  per  second 
(Gbps).  Potential  applications  of  60  GHz  in  particular,  and  mm  wave  in  general,  include  order  of 
magnitude  increases  in  the  data  rates  for  indoor  wireless  networks,  multiGbps  wireless  backhaul 
networks,  and  base  station  to  mobile  links  in  picocells,  and  even  wireless  data  centers.  However, 
realizing  the  vision  of  multiGbps  wireless  everywhere  is  going  to  take  some  work.  While  we  can 
draw  upon  the  existing  toolkit  of  ideas  developed  for  wireless  communication  to  some  extent, 
we  may  have  to  rethink  many  of  these  ideas  because  of  the  unique  characteristics  of  mm  wave 
communication.  The  latter  largely  follow  from  the  order  of  magnitude  smaller  carrier  wavelength 
relative  to  existing  wireless  systems. 

At  the  most  fundamental  level,  consider  propagation  loss.  As  discussed  in  Section  6.5  (see  also 
Problems  6.37  and  6.38),  the  propagation  loss  for  omnidirectional  transmission  scales  with  the 
square  of  the  carrier  frequency,  but  for  the  same  antenna  aperture,  antenna  directivity  scales 
up  with  the  square  of  the  carrier  frequency.  Thus,  given  that  generating  RF  power  at  high 
carrier  frequencies  is  difficult,  we  anticipate  that  mm  wave  communication  systems  will  employ 
antenna  directivity  at  both  ends.  Since  the  inter-element  spacing  scales  with  carrier  wavelength, 


458 


it  becomes  possible  to  accommodate  a  large  number  of  antenna  elements  in  a  small  area  (e.g., 
a  1000  element  antenna  array  at  60  GHz  is  palm-sized!),  and  to  use  electronic  beamsteering  to 
realize  pencil  beams  at  the  transmitter  and  receiver.  Of  course,  this  is  easier  said  than  done. 
Hardware  realization  of  such  large  arrays  remains  a  challenge.  On  the  algorithmic  side,  since 
building  a  separate  up  converter  or  downconverter  for  every  antenna  element  is  infeasible  as  we 
scale  up  the  array,  it  is  essential  to  devise  signal  processing  algorithms  that  do  not  assume  the 
availability  of  the  separate  complex  baseband  signals  for  each  antenna  element.  The  nature 
of  diversity  and  spatial  multiplexing  also  fundamentally  changes  at  tiny  wavelengths:  due  to 
the  directionality  of  mm  wave  links,  there  are  only  a  few  dominant  propagation  paths,  so  that 
designs  for  rich  scattering  models  no  longer  apply.  For  indoor  environments,  blockage  by  humans 
and  furniture  becomes  inevitable,  since  the  ability  of  electromagnetic  waves  to  diffract  around 
obstacles  depends  on  how  large  they  are  relative  to  the  wavelength  (i.e.,  obstacles  “look  bigger” 
at  tiny  wavelengths).  For  outdoor  environments,  performance  is  limited  by  the  oxygen  absorption 
loss  (about  16  dB /km)  in  the  60  GHz  band,  and  rain  loss  for  mm  wave  communication  in  general 
and  the  mm  wave  band  (e.g.,  as  high  as  30  dB/km  in  heavy  rain).  While  link  ranges  of  hundreds 
of  meters  can  be  achieved  with  reasonable  margins  to  account  for  these  effects,  longer  ranges 
than  these  would  be  fighting  physics,  hence  multihop  networks  become  interesting.  Of  course, 
once  we  start  forming  pencil  beams,  networking  protocols  that  rely  on  the  broadcast  nature 
of  the  wireless  medium  no  longer  apply.  These  are  just  a  few  of  the  issues  that  are  probably 
going  to  take  significant  research  and  development  to  iron  out,  which  bodes  well  for  aspiring 
communication  engineers. 

Figure  8.28  depicts  how  picocells  and  mm  wave  communication  might  come  together  to  address 
the  cellular  capacity  crisis.  A  large  macrocellular  base  station  provides  default  connectivity 
via  Long  Term  Evolution  (LTE),  a  fourth  generation  cellular  technology  standardized  relatively 
recently;  despite  its  name,  it  may  not  suffice  for  the  long  term  because  of  exponentially  increasing 
demand.  Picocellular  base  stations  are  deployed  opportunistically  on  lampposts,  and  may  be 
connected  via  a  mm  wave  backhaul  network.  Users  in  a  picocell  could  talk  to  the  base  stations 
using  LTE,  or  perhaps  even  mm  wave.  Users  not  covered  by  picocells  talk  to  the  macrocell  using 
LTE. 

Cooperative  communication  While  we  have  restricted  attention  in  this  book  to  the  study  of  com¬ 
munication  between  a  single  transmitter  and  receiver,  this  provides  a  building  block  for  emerging 
ideas  in  cooperative  communication.  For  example,  neighboring  nodes  could  form  a  virtual  an¬ 
tenna  array,  forming  distributed  MIMO  (DMIMO)  systems  with  significantly  improved  power- 
bandwidth  tradeoffs.  This  allows  us,  for  example,  to  bring  the  benefits  of  MIMO  to  systems 
with  low  carrier  frequencies  which  propagate  well  over  large  distances,  but  are  not  compatible 
with  centralized  antenna  arrays  because  of  the  large  carrier  wavelength.  For  example,  the  wave¬ 
length  at  50  MHz  is  6  meters,  hence  conventional  antenna  arrays  would  be  extremely  bulky,  but 
neighboring  nodes  naturally  spaced  by  tens  of  meters  could  form  a  DMIMO  array.  DMIMO  at 
low  carrier  frequencies  is  a  promising  approach  for  bridging  the  digital  divide  in  cost-effective 
fashion,  providing  interference  suppression  and  multiplexing  capabilities  as  in  MIMO,  along  with 
link  ranges  of  tens  of  kilometers.  Another  promising  example  of  cooperative  communication  is 
interference  alignment,  in  which  multiple  transmitters,  each  of  which  is  sending  to  a  different 
receiver,  coordinate  so  as  to  ensure  that  the  interference  they  generate  for  each  other  is  limited 
in  time-frequency  space.  Of  course,  realizing  the  benefits  of  cooperative  communication  require 
fundamental  advances  in  distributed  synchronization  and  channel  estimation,  along  with  new 
network  protocols  that  support  these  innovations.  More  good  news  for  the  next  generation  of 
communication  engineers! 

Full-duplex  communication:  Most  communication  transceivers  cannot  transmit  and  receive  at 
the  same  time  on  the  same  frequency  band  (or  even  closely  spaced  bands).  This  is  because  even 
a  small  amount  of  leakage  from  the  transmit  chain  can  swamp  out  the  received  signal,  which  is 
much  weaker.  Thus,  communication  networks  typically  operate  in  time  division  duplexed  (TDD) 
mode  (also  more  loosely  termed  half  duplex  mode),  in  which  the  transmitter  and  receiver  use  the 
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Figure  8.28:  The  potential  role  of  small  cells  and  mm  wave  communication  in  future  cellular 
systems  (figure  courtesy  Dinesh  Ramasamy). 
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same  band,  but  are  not  active  at  the  same  time,  or  in  frequency  division  duplexed  (FDD)  mode, 
in  which  the  transmitter  and  receiver  may  be  simultaneously  active,  but  in  different,  typically 
widely  separated,  bands.  There  has  been  promising  progress  recently,  however,  on  relatively  low- 
cost  approaches  to  canceling  interference  from  the  transmit  chain,  seeking  to  make  full  duplex 
operation  (i.e. ,  sending  and  receiving  at  the  same  time  in  the  same  band)  feasible.  If  these 
techniques  turn  out  to  be  robust  and  practical,  then  they  could  lead  to  significant  performance 
enhancements  in  wireless  networks.  Of  course,  networking  with  full  duplex  links  require  revisiting 
current  protocols,  which  are  based  on  either  TDD  or  FDD. 

Challenging  channels:  While  we  have  discussed  issues  related  to  significant  improvements  in 
wireless  data  rates  and  ranges  relative  to  large-scale  commercial  wireless  networks  today,  there 
are  important  applications  where  simply  forming  and  maintaining  a  viable  link  is  a  challenge. 
Examples  include  underwater  acoustic  networks  (for  sensing  and  exploration  in  oceans,  rivers 
and  lakes)  and  body  area  networks  (for  continuous  health  monitoring). 

Wireless- enabled  multi-agent  systems:  Wireless  is  at  the  heart  of  many  emerging  systems  that 
require  communication  and  coordination  between  a  variety  of  “agents”  (these  may  be  machines 
or  humans).  Examples  include  asset  tracking  and  inventory  management  using  radio  frequency 
identification  (RFID)  tags;  sensor  networks  for  automation  in  manufacturing,  environmental 
monitoring,  healthcare  and  assisted  living;  vehicular  communication;  smart  grid;  and  nascent 
concepts  such  as  autonomous  robot  swarms.  Such  “multi-agent”  systems  rely  on  wireless  to 
provide  tetherless  connectivity  among  agents,  as  well  as  to  possibly  provide  radar-style  measure¬ 
ments,  hence  characterization  and  optimization  of  the  wireless  network  in  each  specific  context 
is  essential  for  sound  system  design. 


Scaling  mostly  digital  transceivers 

As  discussed  in  Chapter  1,  a  key  technology  story  that  has  driven  the  growth  of  communication 
systems  is  Moore’s  law,  which  allows  us  to  inexpensively  implement  sophisticated  DSP  algorithms 
in  communication  transceivers.  A  modern  “mostly  digital”  receiver  typically  has  analog-to- 
digital  converters  (ADCs)  representing  each  I  and  Q  sample  with  8-12  bits  of  precision.  As 
communication  data  rates  and  bandwidths  increase,  Moore’s  law  will  probably  be  able  to  keep 
up  for  a  while  longer,  but  the  ADC  becomes  a  bottleneck  as  signal  bandwidths,  and  hence 
the  required  sampling  rates,  scale  to  GHz  and  beyond.  High-speed,  high-precision  ADCs  are 
power-hungry,  occupy  large  chip  areas  (and  are  therefore  expensive),  and  are  difficult  to  build. 
Thus,  a  major  open  question  in  communication  systems  is  whether  we  can  continue  to  enjoy  the 
economies  of  scale  provided  by  “mostly  digital”  architectures  as  communication  bandwidths  go 
up. 

We  have  already  discussed  the  potential  for  mm  wave  communication  and  its  unique  challenges. 
The  ADC  bottleneck  is  one  more  challenge  we  must  add  to  the  list,  but  this  challenge  applies 
to  any  communication  system  which  seeks  to  employ  DSP  while  scaling  up  bandwidth.  An 
important  example  is  fiber  optic  communication,  where  the  bandwidths  involved  are  huge.  For 
the  longest  time,  these  systems  have  operated  using  elementary  signaling  schemes  such  as  on-off 
keying,  with  mostly  analog  processing  at  the  receiver.  However,  researchers  are  now  seeking  to 
bring  the  sophistication  of  wireless  transceivers  to  optical  communication.  By  making  optical 
communication  more  spectrally  efficient,  we  can  increase  data  rates  on  fibers  already  buried  in  the 
ground,  simply  by  replacing  the  transceivers  at  each  end.  Sophisticated  algorithms  are  critical 
for  achieving  this,  and  these  are  best  implemented  in  DSP.  Furthermore,  by  making  optical 
transceivers  mostly  digital,  we  could  obtain  the  economies  of  scale  required  for  high-volume 
applications  such  as  very  short-range  chip-to-chip,  or  intra-chip,  communication.  Compared 
to  wireless  communication,  optical  communication  represents  special  challenges  due  to  fiber 
nonlinearities  and  because  of  its  higher  bandwidth,  while  not  facing  the  difficulties  arising  from 
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mobility. 

Yet  another  area  where  we  seek  increased  speed  and  sophistication  is  wired  backplane  commu¬ 
nication  for  interconnecting  hardware  modules  on  a  circuit  board  (e.g.,  inputs  and  outputs  for  a 
high-speed  router,  or  processor  and  memory  modules  in  computer),  and  “networks  on  chip”  for 
communicating  between  modules  on  a  single  integrated  circuit  (e.g.,  for  a  “multi-core”  processor 
chip  with  multiple  processor  and  memory  modules). 

How  does  one  overcome  the  ADC  bottleneck?  We  do  not  have  answers  yet,  but  there  are  some 
natural  ideas  to  try.  One  possibility  is  to  try  and  get  by  with  fewer  bits  of  precision  per  sample. 
Severe  quantization  introduces  a  significant  nonlinearity,  but  it  is  possible  that  we  could  still 
extract  enough  information  for  reliable  communication  if  the  dynamic  range  of  the  analog  signal 
being  quantized  is  not  too  large.  Of  course,  the  algorithms  that  we  have  seen  in  this  textbook 
(e.g.,  for  demodulation,  linear  equalization,  MIMO  processing)  all  rely  on  the  linearity  of  the 
channel  not  being  disturbed  by  the  ADC,  an  excellent  approximation  for  high-precision  ADC. 
This  assumption  now  needs  to  be  thrown  out:  in  essence,  we  must  “redo”  DSP  for  communication 
if  we  are  going  to  live  with  low-precision  ADC  at  the  receiver.  Another  possibility  is  to  parallelize: 
we  could  implement  a  high-speed  ADC  by  running  lower  speed  ADCs  in  parallel,  or  we  could 
decompose  the  communication  signal  in  the  frequency  domain,  such  that  relatively  low-speed 
ADCs  with  high  precision  can  be  used  in  parallel  for  different  subbands.  These  are  areas  of 
active  research. 


Beyond  Moore’s  law 

Moore’s  law  has  been  working  because  the  semiconductor  industry  keeps  managing  to  shrink 
feature  sizes  on  integrated  circuits,  making  transistors  (which  are  then  used  as  building  blocks 
for  digital  logic)  tinier  and  tinier.  Many  in  the  industry  now  say  that  the  time  is  approaching 
when  shrinking  transistors  in  this  fashion  will  make  their  behavior  non-deterministic  (i.e. ,  their 
output  can  have  errors,  just  like  the  output  of  a  demodulator  in  a  communication  system).  Doing 
deterministic  logic  computations  with  non-deterministic  units  is  a  serious  challenge,  which  looks 
to  be  more  difficult  than  reliable  communication  over  a  noisy  channel.  However,  it  is  an  intriguing 
question  as  to  whether  it  is  possible  to  use  ideas  similar  to  those  in  digital  communication  to 
evolve  new  paradigms  for  reliable  computation  with  unreliable  units.  This  is  a  grand  challenge 
to  which  experts  in  communication  systems  might  be  able  to  make  significant  contributions. 


Parting  thoughts 

The  introductory  treatment  in  this  textbook  is  intended  to  serve  as  a  gateway  to  an  exciting  future 
in  communications  research  and  technology  development.  We  hope  that  this  discussion  gives  the 
reader  the  motivation  for  further  study  in  this  area,  using,  for  example,  more  advanced  textbooks 
and  the  research  literature.  We  do  not  provide  specific  references  for  the  topics  mentioned  in 
this  epilogue,  because  research  in  many  of  these  areas  is  evolving  too  rapidly  for  a  few  books 
or  papers  to  do  it  justice.  Of  course,  the  discussion  does  provide  plenty  of  keywords  for  online 
searches,  which  should  bring  up  interesting  material  to  follow  up  on. 
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signal-to-noise  ratio,  255 
sine  function,  33 
single  parity  check  code,  374 
single  sideband,  see  SSB 
SNR,  255 

analog  modulation,  278 
angle  modulation,  281 
maximization,  256 
source  encoder 
role  of,  17 

Space-time  communication,  399 
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vestigial  sideband  modulation 
see  VS  B,  107 

Walsh- Hadamard  codes,  176 
WGN,  see  White  Gaussian  Noise 
white  Gaussian  noise,  250 

geometric  interpretation,  305 
WiFi 

introduction,  22 
wireless 

introduction,  21 
wireless  channel 

modeling  in  complex  baseband,  77 
ray  tracing,  78 
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